Data blending for multiple data pipelines

ABSTRACT

A processing system including at least one processor may obtain a first request for delivery of a first data set to a first destination, map the first request to a first information model, obtain a second request for delivery of a second data set to a second destination, map the second request to a second information model, and identify that a portion of data is part of both data sets. The processing system may next determine a plan for configuring data pipeline components for delivering the first data set to the first destination and the second data set to the second destination, the plan comprising: a combination of the first information model and the second information model, and at least one modification to the combination. The processing system may then configure the data pipeline components in accordance with the plan.

This application is a continuation of U.S. patent application Ser. No.16/832,041, filed Mar. 27, 2020, now U.S. Pat. No. 11,500,895, which isherein incorporated by reference in its entirety.

The present disclosure relates generally to data pipelines fortransferring batch and streaming data via communications networks, andmore particularly to methods, computer-readable media, and apparatusesfor configuring data pipeline components for delivering a first data setto at least a first destination and for delivering a second data set toat least the second destination in accordance with a plan comprising acombination of a first information model associated with a first requestand a second information model associated with a second request andincluding at least one modification to the combination.

BACKGROUND

A data pipeline is a set of data processing elements connected inseries, where the output of one element is the input of the next. Theelements of a data pipeline may operate in parallel or in a time-slicedfashion. In addition, some amount of buffer storage may be providedbetween other elements. One subset of data pipelines includes extract,transform, and load (ETL) systems, which extract data from a datasource, transform the data, and load the data into a database or datawarehouse. ETL pipelines may run in batches, meaning that the data ismoved in one large chunk at a specific time to the target, e.g., inregular scheduled intervals. A data pipeline is a broader term thatrefers to a system for moving data from one or more sources to one ormore targets in a computing network environment. The data may or may notbe transformed, and it may be processed in real time (or streaming)instead of batches. When the data is streamed, it may be processed in acontinuous flow which is useful for data that is continuously updating,such as a data from a traffic monitoring sensor. In addition, the datamay be transferred to any number of targets, which may include databasesor data warehouses, as well as any number of automated systems,operator/user terminals, and so forth.

SUMMARY

Methods, computer-readable media, and apparatuses for configuring datapipeline components for delivering a first data set to at least a firstdestination and for delivering a second data set to at least the seconddestination in accordance with a plan comprising a combination of afirst information model associated with a first request and a secondinformation model associated with a second request and including atleast one modification to the combination are described. For example, aprocessing system including at least one processor may obtain a firstrequest for a delivery of a first data set to at least a firstdestination, map the first request to a first information model of aplurality of information models, obtain a second request for a deliveryof a second data set to at least a second destination, map the secondrequest to a second information model of the plurality of informationmodels, and identify that at least a portion of data is a part of boththe first data set and the second data set. The processing system maynext determine a plan for configuring data pipeline components fordelivering the first data set to the at least the first destination andfor delivering the second data set to the at least the seconddestination, where the plan comprises a combination of the firstinformation model and the second information model, and where the plancomprises at least one modification to the combination of the firstinformation model and the second information model. The processingsystem may then configure the data pipeline components for deliveringthe first data set to the at least the first destination and fordelivering the second data set to the at least the second destination inaccordance with the plan.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present disclosure can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1 illustrates an example network related to the present disclosure;

FIG. 2 illustrates a flowchart of an example method for generating adata schema for a type of data pipeline component and storing anontology and the data schema for the type of data pipeline component ina catalog of data pipeline component types;

FIG. 3 illustrates a high level block diagram of a computing devicespecifically programmed to perform the steps, functions, blocks and/oroperations described herein;

FIG. 4 illustrates examples scenarios of data blending for differentdata pipelines, in accordance with the present disclosure; and

FIG. 5 illustrates a flowchart of an example method for configuring datapipeline components for delivering a first data set to at least a firstdestination and for delivering a second data set to at least the seconddestination in accordance with a plan comprising a combination of afirst information model associated with a first request and a secondinformation model associated with a second request and including atleast one modification to the combination.

To facilitate understanding, identical reference numerals have beenused, where possible, to designate identical elements that are common tothe figures.

DETAILED DESCRIPTION

Examples of the present disclosure include a system for data pipelineconfiguration and management, which may be referred to as a datapipeline controller, or a Data Pipeline Intelligent Controller (DPIC). Adata pipeline controller may control all the elements of a data pipelineto enable the data pipeline to create a suitable response to satisfy aclient request. The functions, or modules of a data pipeline controllermay include, but are not limited to: schedulers, request interpreters,various artificial intelligence/machine learning modules, policyfunctions, security and privacy enforcement modules, assurancefunctions, negotiation functions, orchestrators, databases, an abstractsymbol manipulator module, a model data schema generator/updater, and soforth.

In one example, a data pipeline controller of the present disclosure maycreate new schemas to handle new source data retrievals and/or tointegrate new data pipeline component types, and may assemble and teardown data pipelines in real-time. In one example, a data pipelinecontroller is flexibly expandable via add-ons, plug-ins, helperapplications, and the like. When a client, such as a data scientist, anetwork operator, or the like seeks to obtain specified data sets frommultiple sources, e.g., to provide to one or more machine learningmodels as target(s), the client may provide the request by specifyingthe desired data and the desired target(s), and the data pipelinecontroller may automatically generate an end-to-end plan to obtain andtransmit the right data from the right source(s) to the right target(s).Thus, the present disclosure provides for intelligent control of datapipelines via a data pipeline controller that automatically integratesand directs data pipeline components at a higher level of abstraction.Data pipelines may be constructed dynamically, and on an as-needed basissuch that even complex or demanding client requests may be fulfilledwithout (or with minimal) human interaction, and withoutcomponent-specific human expertise regarding the various data pipelinecomponents.

In many cases, a data pipeline or its associated support functions arein existence but the data pipeline itself may be inactive. In othercases, a data pipeline may not be physically or virtually established,but all the support functions are available in the cloud. In response toa request for data transfer, examples of the present disclosure mayactivate an inactive data pipeline or may form a new data pipeline inreal-time. Examples of the present disclosure may further includefeatures for: security, access, authentication, and authorization (AAA),(for instance, a requestor may not have the right to a data set; thepresent disclosure may take the role to gain rights for protected dataset(s)), accounting services, proxy creation, protocol setting, paymentsettlement, and so on.

In one example, a data pipeline component discovery module of a datapipeline controller continuously discovers new or changed conditions ina data pipeline infrastructure. In one example, the data pipelinecontroller may determine how to fulfill data requests with alternativemechanisms. For instance, the data pipeline controller may determine ifintermediate nodes or data stores could be established to improveefficiency or other performance/quality aspects. In one example, aresult of a request may be stored as a copy in a source node, in aspecified intermediate node, or at one or more target nodes, such thatthe result may be reused for one or more subsequent requests. Thepurpose is not to replace the data pipeline's native data fulfillmentfunctions, but rather to assist, suggest, or command how the datapipeline handles its fulfillment aspects.

Examples of the present disclosure also ensure that data is wellunderstood. For instance, data sources may be indexed and a requestormay learn upfront what data is available. In accordance with the presentdisclosure, a data pipeline may be dynamically established andsubsequently torn down. Thus, a data pipeline may not always be apersistent entity. In one example, a data pipeline controller of thepresent disclosure is aware of each data pipeline that is in existence,and knows each data pipeline's history. In addition, in one example, ifa request cannot be automatically satisfied, the data pipelinecontroller may provide meaningful explanation of the gaps, which mayallow data scientists working offline to improve tools/modules at thedata pipeline level.

A data pipeline controller of the present disclosure and/or variousmodules thereof may be configured for several use patterns, e.g.,including but not limited to: inquiry/browsing, requesttemplate/specification and analysis/planning, data source/data pipelineindexing, notification, and request and fulfillment. Interactions of thedata pipeline controller with other entities in these patterns may bevia any appropriate means, such as: direct or indirect communications;forwarded, routed, or switched communications; application programminginterfaces (APIs); bus messages; subscribe-publish events; etc.

Inquiry/browsing—This pattern may be used to verify if a data pipelinecontroller can arrange the fulfillment of an inquiry. For example, arequestor may browses a data pipeline catalog to select particular dataor data set(s), and may send an inquiry to the data pipeline controller,which may then determine and respond with availability (and potentiallycommitments, reservations, verifications, etc.) along with associatedinformation related to the data/data set(s) that is/are identified inthe inquiry, such as: estimated freshness, latency, quality, etc.

Request template/specification and analysis/planning— A requestor maysend an actual request to the data pipeline controller for simulatedprocessing, such as a particular template/specification of desired dataor data set(s). The data pipeline controller may command and coordinatewith data pipeline components to perform analysis, search, planning offunctional steps, and so forth, in order to provide informativeresponses. For example, in some cases the data pipeline controller mayreturn one or more of three potential responses: (1) requesting thatmore information should be provided, (2) indicating that specialauthorization may be needed, and (3) providing example(s) of fulldata/data set response (if possible) or partial data/data set response(e.g., if the requested data/data set(s) is/are large, if “1” or “2”also apply, etc.).

In one example, information models may have associated request templateswhich may be predefined (e.g., by a creator/administrator of aninformation model) and/or which may be learned over time as requests arematched to different information models, as feedback on the quality andcorrectness of the matching is provided by client request submitters,and so forth. In one example, multiple request templates may be storedand maintained in association with an information model. For instance,the same information model may be matched to different requests, whichmay all relate to a same general type of data delivery, but withsomewhat different specifics, such as one or more different datasources, one or more different targets, with or without an intermediatestorage node, etc.

It should be noted that information models and associated requesttemplates may have more or less detail, and more or less fixed and/orconfigurable parameters depending upon the preferences of a systemoperator, a creator of an information model, etc. For instance, in oneexample, an information model and/or an associated request template maybe for obtaining specific data from specific data sources and deliveringto selected targets. In other words, the data and data sources may befixed and are not further selectable (at least with this particularinformation model). However, another information model may be forobtaining selectable data from selectable data sources within a specificarea for delivery to selectable targets. In other words, the location orregion may be fixed, while the data and the data sources are not fixedand can be selected (e.g., via a request that is crafted in accordancewith an associated request template and/or via a custom crafted requestthat is mapped to the information model). In one example, the requesttemplate/specification and analysis/planning use pattern may includeproviding access to a catalog of request templates from which a clientmay select a template for use (e.g., for simulated or actual fulfillmentof a request).

Data source/data pipeline indexing—The data pipeline controller may addnew data sources (or even full data pipelines) to a catalog of datapipeline infrastructure components.

Notification—The data pipeline controller may notify requestors and/orsubscribers of new data pipeline components or data pipeline componenttypes. For instance, when a new data pipeline component or data pipelinecomponent type is discovered, the data pipeline controller may notifyprevious requestors and/or publish/post notifications to those whopreviously subscribed to the notification messages (e.g., of theparticular scope of the new findings).

Request and fulfillment—Stored data set(s) or stream data may beobtained by a requestor or an automated system sending a request ortrigger to the data pipeline controller. The request/trigger may besimple in some cases, but may be expected to include (directly or byreference) detailed specification information such that the appropriatedata or data set(s) can be identified, prepared, and provided. In oneexample, the data pipeline controller may first check if the same or asimilar request has recently gone through the requesttemplate/specification and analysis/planning pattern (e.g., as outlinedabove), and if so, some portions of the fulfillment process may beomitted for the sake of efficiency (e.g., if various safety/qualityassurance criteria are met). For instance, the request specification maybe sent to data sources and resulting data may be joined in appropriatenode(s) in order to avoid unnecessary work, with final data/data set(s)then being delivered to the requestor.

Feedback—This pattern enables a requestor to provide feedback to thedata pipeline controller regarding its automated actions. For instance,a data requester may provide data usage/quality feedback to the datapipeline controller, which can then use the feedback to fine tunevarious relevant data manipulation processes.

Discovery—this pattern enables the data pipeline controller to discoverfunctionalities of data pipeline functions. The discovery pattern mayinclude two aspects. (1) Proactive discovery, in which a pre-specifiedmodel (e.g., information model) may be provided to the data pipelinecontroller. Based on scheduling and the information model specification,the data pipeline controller may proactively discover newly formed datapipeline components (and/or data pipeline component types) or maydiscover updates to data pipeline components (and/or data pipelinecomponent types) that may have been modified. (2) Reactive discovery, inwhich each data pipeline component, once instantiated or modified, maynotify the data pipeline controller of its existence. In some cases,where the data pipeline controller engages in a proactive discoveryrole, the data pipeline controller may follow what is defined in aninformation model and may verify the existence of underlying datapipeline components (e.g., one or more instances of data pipelinecomponent types which is/are identified in the information model). Aninformation model may also be leveraged in a “reactive” model. In thiscase, data pipeline components may notify the data pipeline controllerof the components' whereabouts and details.

In addition, in one example, when the data pipeline controller becomesaware of a new data source or other data pipeline components (or a newdata source type and/or a new data pipeline component type), the datapipeline controller may attempt to derive a default data schema (and fora new data source, to also profile the data). The data schema may be interms of the symbols that the data pipeline controller is made aware of(e.g., from a provided ontology). A system operator may also validate orcorrect the automatically-generated data schema. Additionally, the datapipeline controller may validate fresh batches of data from a datasource against a previously defined data schema, and any differences inthe statistical profile of the new batch versus previous batches may benoted.

Thus, examples of the present disclosure provide a framework for a datapipeline controller that supports both data request and datafulfillment. Users no longer need to know the details of how to acquireor reformat the data sets. This is handled by the data pipelinecontroller configuring the data pipeline instances. The data pipelinecontroller comprises various modules which collectively function todecompose a single data request into sub-parts. In one example, a datapipeline controller of the present disclosure may dynamically decidealternative ways to obtain the requested data set(s) when one or moredata sources are not available. Based on a request, a data pipelinecontroller may dynamically command a data pipeline to createintermediate nodes which can, for example, act as temporary stagingpoints to optimally accomplish sharing/reuse for performance gains. Inaddition, a data pipeline controller of the present disclosure maygenerate data schema(s) for new types of data sources and/or datapipeline components (e.g., when data schemas are not provided with thesenew components).

In one example, a data pipeline controller architecture may includethree high level subsystems: support modules, which discover componentswithin a data pipeline environment, including modules, technologies,collectors, filters, etc.; management & assembly modules, which can readontologies and use newly discovered information regarding data pipelinesand/or data pipeline components to refine information model(s) inreal-time (the information model contains a specification indicating howan existing pipeline structure can be enhanced or how a new pipelinestructure can be established); and request fulfillment modules, whichfulfill client data requests and hide execution details from the clients(request fulfillment modules will use the predefined information modelsto dynamically service each data request). In accordance with thepresent disclosure, a data blending module may be included in a blendinglayer that is added in the middle of the three abovementionedsubsystems. The functions provided by this blending layer may be used byall three subsystems as described below. In another example, the datablending module may be included in the support modules or may be incommunication with other component modules of a data pipeline controllervia a high-speed bus.

In accordance with the present disclosure, a data blending module mayestablish a plan for blending data on behalf of a single request ormultiple requests from multiple clients (e.g., for potential reuse anddata processing efficiency). For example, when multiple data deliveryrequests are received, the data blending module may be invoked toprovide an analysis to determine if duplicated datasets are involved.The data blending module may also check if all or a portion of arequested dataset or datasets is available in one or many of theintermediate nodes. The data blending module may also check if the datasource(s) have been updated (e.g., while the intermediate node(s) maynot have been synced/updated). In one example, the data blending modulemay determine a most efficient pipeline configuration to deliver datafrom the source(s) to the target(s) (or at least a more efficientpipeline configuration as compared to independently establishing datapipelines for each request according to respective information modelsassociated with each request), e.g., with a least number of intermediateprocessing steps (or at least a reduced number of intermediateprocessing steps). In some cases, the data blending module may determineto reuse a dataset residing at a target to be redirected to a newtarget. In one example, the data blending module may create two datapipelines (or two separate paths in a single pipeline) to allow sourcedata to be delivered to a target in multiple paths. In one example, whentwo paths for the same dataset are routed to two different intermediatenodes, real data movement occurs from the source, which may avoid makingduplications from a single intermediate node. In one example, thisenables a “true” redundant copy to be created in a regular session.

In one example, the data blending module is capable of and is configuredto override any existing data pipeline functions. From an externalperspective, the existing data pipeline functions may appear to beunaltered, since the ultimate results are the same. However, internally,some of the data delivery paths (e.g., data pipelines) may berearranged. This may also help to deter any attacks on the data pipelineinfrastructure. For instance, an attacker may not know the real path(s)of the data pipeline(s). In addition, in one example, a data blendingmodule of the present disclosure is capable of handling stored datasetsas well as streaming datasets, or data streams.

In one example, a data blending module supports “full blending,”“partial blending,” or “any level of blending via policy enablement.”For example, when a request is received, the data blending module maydetermine that a full blending may violate some authentication rules ordata access rules. As long as a subset of a full dataset is allowed tobe retrieved, the data blending module may establish a partial blendingto facilitate the creation of an intermediate dataset, and then from theintermediate dataset, the filtering-out of appropriate data attributesto satisfy individual data request(s). In one example, the data blendingmodule assumes a “blending as a default” setting, unless otherwisespecified via a “class setting.” For instance, data requests may beclassified from 1 and 10, or specified by a provider of the datapipeline environment. In one example, classes 1-3 may be reserved forspecial requests (e.g., data requests for state of federal governmentalentities, corporate data requests via public cloud, etc.). For theseclasses, data blending may be provided when specifically authorized by arequesting client or when authorized by an organization responsible forthe requesting client. In one example, when data blending is not set asa default, “reverse hooks” may be specified in an information model orcontrolled by “policies.” On the other hand, classes 4-10 may allow datablending automatically, or by default.

Thus, based on policies and information modeling, a data blending modulemay dynamically configure intermediate nodes to perform a variety ofdata operations. The data blending module may provide any or all of thefollowing functions: the data blending module may collectively processmultiple requests for data delivery to determine optimal or moreefficient delivery paths for each data request in a secure and reliableway; the data blending module may follow policies and information models(tuned periodically by machine learning modules) to establish blendedpipelines or paths; the data blending module may dissect dataset(s) intoappropriate chunks to facilitate reuse; and the data blending module mayblend datasets and store the datasets in various intermediate nodes forreliability, redundancy, and/or reuse. In one example, data blending maybe offered as a service. For example, a third party provider may offeran umbrella of alternative blending capabilities that can be subscribedto by clients. This also enables an open architecture for atraditionally closed data pipeline environment. These and other aspectsof the present disclosure are described in greater detail below inconnection with the examples of FIGS. 1-5 .

To further aid in understanding the present disclosure, FIG. 1illustrates an example system 100 in which examples of the presentdisclosure for generating a data schema for a type of data pipelinecomponent and storing an ontology and the data schema for the type ofdata pipeline component in a catalog of data pipeline component typesand/or for configuring data pipeline components for delivering a firstdata set to at least a first destination and for delivering a seconddata set to at least the second destination in accordance with a plancomprising a combination of a first information model associated with afirst request and a second information model associated with a secondrequest and including at least one modification to the combination mayoperate. The system 100 may include any one or more types ofcommunication networks, such as a traditional circuit switched network(e.g., a public switched telephone network (PSTN)) or a packet networksuch as an Internet Protocol (IP) network (e.g., an IP MultimediaSubsystem (IMS) network), an asynchronous transfer mode (ATM) network, awireless network, a cellular network (e.g., 2G, 3G, 4G, 5G and thelike), a long term evolution (LTE) network, and the like, related to thecurrent disclosure. It should be noted that an IP network is broadlydefined as a network that uses Internet Protocol to exchange datapackets. Additional example IP networks include Voice over IP (VoIP)networks, Service over IP (SoIP) networks, and the like.

In one example, the system 100 may comprise a telecommunication network101. Telecommunication network 101 may combine core network componentsof a cellular network with components of a triple play service network;where triple-play services include telephone services, Internet servicesand television services to subscribers. For example, telecommunicationnetwork 101 may functionally comprise a fixed mobile convergence (FMC)network, e.g., an IP Multimedia Subsystem (IMS) network. In addition,telecommunication network 101 may functionally comprise a telephonynetwork, e.g., an Internet Protocol/Multi-Protocol Label Switching(IP/MPLS) backbone network utilizing Session Initiation Protocol (SIP)for circuit-switched and Voice over Internet Protocol (VoIP) telephonyservices. Telecommunication network 101 may further comprise a broadcasttelevision network, e.g., a traditional cable provider network or anInternet Protocol Television (IPTV) network, as well as an InternetService Provider (ISP) network. In one example, telecommunicationnetwork 101 may include a plurality of television (TV) servers (e.g., abroadcast server, a cable head-end), a plurality of content servers, anadvertising server (AS), an interactive TV/video on demand (VoD) server,and so forth. For ease of illustration, various additional elements oftelecommunication network 101 are omitted from FIG. 1 .

The telecommunication network 101 may be in communication with datapipeline infrastructure 120 and the Internet in general (not shown). Inone example, the data pipeline infrastructure 120 may comprise “public”cloud or “private” cloud infrastructure. For instance, all or a portionof the data pipeline infrastructure 120 may be controlled by a sameentity as telecommunication network 101. In such an example, the datapipeline infrastructure 120 may be considered part of thetelecommunication network 101. Alternatively, or in addition, all or aportion of the data pipeline infrastructure 120 may be controlled byand/or operated by another entity providing cloud computing services toclients/subscribers. The data pipeline infrastructure 120 may include aplurality of data pipeline components 127, such as adapters, collectors,intermediate nodes, forwarders, data stores, and so forth. The datapipeline infrastructure 120 may comprise servers/host devices (e.g.,computing resources comprising processors, e.g., central processingunits (CPUs), graphics processing units (GPUs), programmable logicdevices (PLDs), such as field programmable gate arrays (FPGAs), or thelike, memory, storage, and so forth), which may provide virtualizationplatforms for managing one or more virtual machines (VMs), containers,microservices, or the like. For instance, in such case the data pipelinecomponents 127 may comprise virtual machines, containers, microservices,or the like, which may provide the various functions of data pipelinecomponents, such as a collector, an adapter, a forwarder, etc. In oneexample, the data pipeline components 127 may also include dedicatedhardware devices, e.g., one or more servers that may comprise one ormore adapters, collectors, intermediate nodes, etc. and which may beconfigured to operate in various data pipelines (but which may not bereadily adaptable to provide a different type of service). In oneexample, the data pipeline components may each comprise a computingsystem or server, such as computing system 300 depicted in FIG. 3 , andmay be configured to provide one or more operations or functions inconnection with examples of the present disclosure for generating a dataschema for a type of data pipeline component and storing an ontology andthe data schema for the type of data pipeline component in a catalog ofdata pipeline component types and/or for configuring data pipelinecomponents for delivering a first data set to at least a firstdestination and for delivering a second data set to at least the seconddestination in accordance with a plan comprising a combination of afirst information model associated with a first request and a secondinformation model associated with a second request and including atleast one modification to the combination, as described herein.

In one example, the data pipeline infrastructure 120 may also includeone or more data sources 125 and one or more targets 129. However, inanother example, these devices or systems may be considered to beoutside the data pipeline infrastructure 120. The data sources 125 mayinclude network devices, e.g., routers, switches, multiplexers,firewalls, traffic shaping devices or systems, base stations, remoteradio heads, baseband units, gateways, and so forth. The data from thedata sources 125 may therefore comprise various types of networkoperational data, such as: channel quality information, a number ofendpoint devices served by a base station, records and/or alertsregarding network anomaly detections, throughput information, linkconnectivity information, port utilization metrics, and so on. In oneexample, the data sources 125 may alternatively or additionally comprisesensor devices, e.g., temperature sensors, humidity sensors, wind speedsensors, magnetometers, pressure sensors, etc. Thus, the data from datasources 125 may comprise measurements of temperature, humidity, windspeed, pressure, magnetic field strength and/or direction, and so forth.In still another example, the data sources 125 may alternatively oradditionally include digital still and/or video cameras, photographand/or video repositories, medical imaging repositories, financial datastorage systems, medical records storage systems, and so forth.Accordingly, the data that is available from data sources 125 mayalternatively or additionally include, images, videos, documents, and soforth. It should be noted that data from various data sources 125 may befiltered and transformed to achieve one or more data sets and/or subsetsof data that can be common across a set of data pipelines and datapipeline instances. In one example, the targets 129 may comprise variousdevices and/or processing systems, which may include various machinelearning (ML) modules hosting one or more machine learning models(MLMs). For instance, a first one of the targets 129 may comprise a MLMto process image data and may be trained to recognize images ofdifferent animals, a second one of the targets 129 may comprise a MLM toprocess financial data and may be trained to recognize and alert forunusual account activity, and so forth. Targets 129 may also includeuser endpoint devices, storage devices, and so forth.

As further illustrated in FIG. 1 , telecommunication network 101 mayinclude a data pipeline controller 110. In one example, the datapipeline controller 110 may comprise a computing system or server, suchas computing system 300 depicted in FIG. 3 , and may be configured toprovide one or more operations or functions for generating a data schemafor a type of data pipeline component and storing an ontology and thedata schema for the type of data pipeline component in a catalog of datapipeline component types and/or for configuring data pipeline componentsfor delivering a first data set to at least a first destination and fordelivering a second data set to at least the second destination inaccordance with a plan comprising a combination of a first informationmodel associated with a first request and a second information modelassociated with a second request and including at least one modificationto the combination, as described herein. For instance, a flowchart of anexample method 200 for generating a data schema for a type of datapipeline component and storing an ontology and the data schema for thetype of data pipeline component in a catalog of data pipeline componenttypes is illustrated in FIG. 2 and described in greater detail below. Inaddition, a flowchart of an example method 500 for configuring datapipeline components for delivering a first data set to at least a firstdestination and for delivering a second data set to at least the seconddestination in accordance with a plan comprising a combination of afirst information model associated with a first request and a secondinformation model associated with a second request and including atleast one modification to the combination is illustrated in FIG. 5 anddescribed in greater detail below.

It should be noted that as used herein, the terms “configure,” and“reconfigure” may refer to programming or loading a processing systemwith computer-readable/computer-executable instructions, code, and/orprograms, e.g., in a distributed or non-distributed memory, which whenexecuted by a processor, or processors, of the processing system withina same device or within distributed devices, may cause the processingsystem to perform various functions. Such terms may also encompassproviding variables, data values, tables, objects, or other datastructures or the like which may cause a processing system executingcomputer-readable instructions, code, and/or programs to functiondifferently depending upon the values of the variables or other datastructures that are provided. As referred to herein a “processingsystem” may comprise a computing device including one or moreprocessors, or cores (e.g., as illustrated in FIG. 3 and discussedbelow) or multiple computing devices collectively configured to performvarious steps, functions, and/or operations in accordance with thepresent disclosure.

In one example, the data pipeline controller 110 may include a pluralityof modules 111-119 which provide for particular functions of the datapipeline controller 110. For instance, each component/module 111-119 maycomprise respective code, executable images, etc., that can be loadedinto memory and executed by one or more processors to collectivelycomprise an operational data pipeline controller 110.

As noted above, each of the data pipeline components 127 may have a datapipeline component type, such as an adapter, collector, forwarder, etc.In one example, for each data pipeline component type, the data pipelinecontroller 110 may store a respective data schema in the ontology anddata schema repository 115. A data schema for a data pipeline componenttype establishes how a function of a data pipeline component (of thedata pipeline component type) is performed at runtime. It includesrelationships among data attributes along with a mini-flow (ormicro-level flow sequence). In addition, for each data pipelinecomponent type, the ontology and data schema repository 115 may alsostore a respective ontology for the data pipeline component type. Anontology defines what an instance of the data pipeline component type isand the functions of the data pipeline component instance (e.g., Vendor3 Adapter 6 Version 2) and its functions (but does not define how thefunctions are used—this is provided by the data schema). It should alsobe noted that insofar as the data sources 125 and targets 129 maycomprise part of a data pipeline, these devices or systems may also haverespective data pipeline component types for which respective ontologiesand associated data schemas may be stored by the ontology and dataschema repository 115.

In general, an ontology defines classes (also referred to as “concepts”or “attributes”) and properties (also referred to as “slots”) definingfeatures of the classes. As described herein, each data pipelinecomponent type has its own ontology. However, in some taxonomies, eachdata pipeline component type may comprise its own “class” in a singularontology or knowledge base of “data pipeline component types” withadditional attributes of the data pipeline component type comprising“sub-classes” in one or more layers below the “class” layer. Theontologies for different data pipeline component types may thus beconsidered “classes” according to some interpretations. In one example,the format of an ontology may be defined by an operator of the datapipeline controller 110. For instance, an ontology format may have ahierarchy of layers or levels, there may be certain required classes,certain required properties, etc., certain required class restrictions,certain required values for one or more properties, class restrictions,etc., and so on.

In one example, for each new data pipeline component type that becomesavailable, a vendor may provide an associated ontology. In some cases, avendor of a new data pipeline component type may also provide anassociated data schema. This is illustrated in FIG. 1 where an ontologyand/or data schema for a data pipeline component type 190 may be inputto the data schema generator/updater module 116. For instance, theontology and/or data schema for a data pipeline component type 190 maybe provided via one of vendor devices 185. In an example where thevendor has provided both an ontology and a data schema, the data schemagenerator/updater module 116 may simply store a record for the new datapipeline component type comprising the ontology and the data schema inthe ontology and data schema repository 115. However, where only anontology is provided, the data schema generator/updater module 116 mayautomatically generate a data schema based upon the ontology and storethe record comprising the ontology and the data schema in the ontologyand data schema repository 115.

In particular, the data schema generator/updater module 116 maydetermine a similarity between the new type of data pipeline componentand one or more existing types of data pipeline components havingrecords in the ontology and data schema repository 115. In one example,the similarity between the new type of data pipeline component type andan existing type of data pipeline component may be quantified based upona congruence between the ontology of the new type of data pipelinecomponent (e.g., a first ontology) and the ontology of the existing typeof data pipeline component (e.g., a second ontology). For example, thecongruence may be based upon a number of matches between classes,properties, and/or class restrictions (broadly, “features”) of the firstontology and the classes, properties, and/or class restrictions (e.g.,“features”) of the second ontology. In one example, there may bedifferent weights applied for matches among different features e.g.,depending upon the level of the features within a hierarchy of theontology format, for example.

In one example, the data schema generator/updater module 116 may copy orprovide the data schema for the best matching (e.g., the highestcongruence measure or score) existing type of data pipeline component asa template for a data schema for the new type of data pipelinecomponent. In one example, the data schema generator/updater module 116may provide a notification to an operator of the data pipelinecontroller 110, e.g., at one of client devices 188, indicating theautomatic selection of a data schema template for the new type of datapipeline component. In one example, the operator may then approve of thetemplate for use as the data schema for the new type of data pipelinecomponent. In one example, the operator may make changes ormodifications to the template, and provide the changes/modifications tothe data schema generator/updater module 116. In one example, dataschemas for a top X matching data pipeline components may be returned tothe operator, from which the operator may browse and select one of thedata schemas as a template (and which may be unmodified, or which may bemodified by the operator) that is returned to the data schemagenerator/updater module 116. Thus, the operator may verify that thedata schema generator/updater module 116 is generating valid dataschemas. The data schema generator/updater module 116 may then store thetemplate (either modified or unmodified) as the data schema for the newtype of data pipeline component, along with the respective ontology, inthe ontology and data schema repository. Instances of the new type ofdata pipeline component may then be made available for use in the datapipeline infrastructure 120.

To support the fulfillment of requests by the data pipeline controller110, there may be a catalog of predefined “information models,” storedin information model repository 114. The information models may comprisespecifications for data pipelines for various use cases. For instance,in one example each “task type” may have an associated informationmodel. In another example, there may be a number of information modelsassociated with each task type. For instance, a first information modelassociated with the task type of “market intelligence” may relate to“cellular,” and a second information model associated with the task typeof “market intelligence” may relate to “VoIP.” In one example, eachinformation model may be associated with or may comprise metadatarelating to one or more of: a name, a region, a task type, a technology,and various other types of parameters. As illustrated in FIG. 1 , aninformation model 195 may be submitted by an operator via one of clientdevices 188 to the information model updater/generator module 113, whichmay store the information model in information model repository 114.Once stored in the information model repository 114, the informationmodel 195 may then be used in fulfillment of requests (e.g., requestswhich are matched to the information model 195).

As noted above, each information model may comprise a specification fora data pipeline. For instance, each information model may comprise hooksto a plurality of data schemas. The data schemas may be for a pluralityof data pipeline component types. As also noted above, the data schemasare specific to particular component types, and provide information onhow each of the data source(s) and/or data pipeline component 127 may beutilized, accessed, interacted with, etc. For instance, data pipelinecomponents 127 may include components of various component types, suchas: adapters, collectors, intermediate nodes, forwarders, data stores,and so forth. For instance, data pipeline components 127 may include twocomponents of type “A” (e.g., A1 and A2), two components of type “B”(e.g., B1 and B2), and one component each of component types “C” and“D.” In the present example, information model 195 may comprise orprovide a specification which may result in the establishment and/orreconfiguration of the data pipeline 121, which may include A1, B1, C,and D from data pipeline components 127. In one example, the informationmodel, or specification, may include a plurality of mini-specificationsfor driving data retrievals and data joins. For instance, eachmini-specification may be tailored to a respective data source (or datasource type). In one example, a higher-level specification may bedelivered to intermediate points to merge data streams. Thespecification(s) may be configured based upon the data schemas ofrespective data pipeline component types and the overall sequence of theinformation model 195.

A mini-specification may also be tailored to a set of pipeline instanceswhere data from a more general view is filtered or enriched for theirinstance-specific scopes. For example, data fulfillment, management andassembly modules may efficiently optimize synergies across pipelinerequirements, maintain data source updates from sources, and utilizetransformation processes to map those updates to pipeline instancerequirements, and manage filtering, enriching, and propagating theupdates into pipeline instances for data ingestion. Using theinformation models and data pipeline requirements, the data pipelinecontroller 110 may optimize pipeline infrastructure workloadrequirements to maximize and manage synergies across existing/new datapipeline controller types to ensure data source updates occur to fulfilldata pipeline instance requirements and service level agreements (SLAs),and to further achieve economies of scale.

In one example, a new information model, such as information model 195,may lead to the discovery of a new data pipeline component type. Forinstance, an information model may assume the existence of a datapipeline component type for which there is no record in the ontology anddata schema repository 115. In such case, the information modelupdater/generator module 113 may notify the operator via the clientdevice 188 that an ontology and data schema are missing for thisassumed-to-be new data pipeline component type. In one example, theoperator of client device 188 may provide an ontology, a data schema, orboth, which may be provided to the data schema generator/updater module116. In another example, the operator may contact a vendor, which may berequested to provide an ontology and/or a data schema.

To further illustrate the functions and features of data pipelinecontroller 110, an example request 197 for delivery of data from one ormore of the data sources 125 to one or more of the targets 129 may beprocessed by the data pipeline controller 110 as follows. First, therequest 197 may be crafted via a client device 188, which may specify adesired delivery of data from one or more of data sources 125 to one ormore of the targets 129. It should be noted that in one example, therequest 197 may comprise a “trigger,” e.g., where the requesting clientdevice 188 is an automated system. The request 197 may identify specifictypes of data, specific fields of data, specific sources or types ofsources, geographic locations of sources or logical groupings of sources(e.g., all routers within a given network region, all devices in asubnet, all base stations in a selected state, wind speed informationfor a selected geographic area for a selected time period, all capturedimages or video in a selected area for a selected period of time, etc.).In one example, a user may generate the request 197 in accordance with arequest template, such as in accordance with the example requesttemplate/specification and analysis/planning use pattern describedabove.

In one example, the request 197 may initially be received and processedvia the request interpreter and fulfillment module 111 of data pipelinecontroller 110. The request interpreter and fulfillment module 111 mayfirst attempt to match the request 197 to a most applicable informationmodel. For instance, the request interpreter and fulfillment module 111may first parse the request to determine which data sources 125 areimplicated, the data of data sources 125 that is being requested, thetarget(s) 129 to which the data is to be delivered, etc. The request 197may be simple in some cases, but may include (directly or by reference)detailed specification information such that the appropriate data ordataset(s) can be identified, prepared, and provided. Note that in somecases, the request interpreter and fulfillment module 111 may firstcheck if a same request has recently been processed by the data pipelinecontroller 110, and if so, some portion of the fulfillment process maybe omitted for the sake of efficiency (e.g., if various safety/qualityassurance criteria are met). For instance, a specification for therequest 197 may be sent to data sources 125 and resulting data may bejoined in appropriate node(s) (e.g., data pipeline components 127) inorder to avoid unnecessary work, with final data/dataset(s) then beingdelivered to the desired target(s) 129. Otherwise, additional analysisand planning may first be executed.

In one example, the request interpreter and fulfillment module 111 maybe configured to process requests that may be in accordance with variousData Definition Languages (e.g., Structured Query Language (SQL),eXtensible Markup Language (XML) Schema Definition (XSD) Language, JavaScript Object Notation (JSON) Schema, etc.). In one example, the requestinterpreter and fulfillment module 111 comprises an abstract symbolmanipulator that extracts symbols from data definition languages andhandles rules relating the symbols. As such, the data pipelinecontroller 110 may handle any data for which descriptor symbols havebeen provided.

In one example, the data pipeline controller 110 may map the request 197to a most appropriate information model. For instance, the request 127may comprise metadata relating to one or more names (e.g., of one ormore of the data sources 125, targets 129, types of data sources, and/ortypes of targets, etc.), one or more regions (e.g., a town, a county, astate, a numbering plan area (NPA), a cell and/or a cluster of cells, asubnet, a defined network region (e.g., a marketing area), etc.), one ormore task types (e.g., “market intelligence,” “network load balancing,”“media event support” (e.g., data analysis for large network-impactingevents, such as for large concerts, sporting events, etc.), and soforth), a technology (e.g., cellular, Voice over Internet Protocol(VoIP), fiber optic broadband, digital subscriber line (DSL), satellite,etc.), and/or various additional parameters. Such metadata, orparameters, may be explicitly defined in the request 197 as particularmetadata fields or may be extracted from the terms of the request 197(e.g., identified in a query in accordance with a particular DataDefinition Language). In any case, the request interpreter andfulfillment module 111 may identify various metadata/parameters of therequest 197 and may provide such terms to the information modelrepository 114.

The information model repository 114 may store a plurality of“information models” (e.g., a catalog or data store). The informationmodels may comprise specifications for data pipelines for various usecases. For instance, in one example each “task type” may have anassociated information model. In another example, there may be a numberof information models associated with each task type. For instance, afirst information model associated with the task type of “marketintelligence” may relate to “cellular,” and a second information modelassociated with the task type of “market intelligence” may relate to“VoIP”. In one example, each information model may be associated with ormay comprise metadata relating to one or more of: a name, a region, atask type, a technology, and various other types of parameters.

In one example, the information model repository 114 may map the requestto one or more of the information models. For instance, the informationmodel repository 114 may map the request to the at least the firstinformation model based upon a congruence between the metadata of therequest and the metadata of each of the one or more information models.For instance, an information model having metadata that most closelymatches the metadata of the request 197 may be identified. In oneexample, the top X information models having the closest matches to themetadata of the request 197 may be identified. The matching of therequest 197 to each information model may be scored based upon a numberof metadata fields that match. In one example, some fields may beweighted such that a match (or lack thereof) with respect to a givenmetadata field may have a greater or lesser impact on an overall scorefor the congruence, or match, between a given request and a particularinformation model. In one example, the top matching information model,or the top X matching information models may then be returned to therequest interpreter and fulfillment module 111. It should be noted thatin another example, the matching may be performed via the requestinterpreter and fulfillment module 111. For instance, the requestinterpreter and fulfillment module 111 may scan the information modelsin the information model repository 114 to determine matching scores fordifferent information models. However, in any case, the requestinterpreter and fulfillment module 111 may select one of the informationmodels (e.g., the top matching information model) for use inestablishing and/or reconfiguring a data pipeline to fulfill the request197. It should be noted that in one example, the request 197 may besubmitted in accordance with a request template that may be matched tothe information model 195. In such case, the request interpreter andfulfillment module 111 may select the information model 195 based uponthe stored association between the request 197 and the information model195. It should also be noted that in one example, the requestinterpreter and fulfillment module 111 may provide user tendency, andbehavioral tracking and analytics. For instance, the request interpreterand fulfillment module 111 may provide an enhanced user experience inwhich the request interpreter and fulfillment module 111 may recognizethe requestor and may use the past tendency to quickly identify andsuggest one or more relevant information models.

In one example, the request interpreter and fulfillment module 111 mayprovide a notification of the selected information model(s) to theclient device 188 that submitted the request 197. In one example, thenotification may provide an opportunity for the client device 188 tosubmit a confirmation to the request interpreter and fulfillment module111 to proceed with the selected information model (or to select one ofthe suggested information models for use). Likewise, the notificationmay provide an opportunity for the client device 188 to decline aselected information model. In such case, the request interpreter andfulfillment module 111 may provide one or more additional informationmodels as suggestions (e.g., one or more of the next top X of theclosest matching information models). Alternatively, or in addition, thenotification may provide the client device 188 with the opportunity tomodify a selected information model, or to create a new informationmodel using the selected information model as a template (e.g., alongwith possible additional modifications). For instance, a user of theclient device 188 submitting the request 197 may be aware of a new typeof data pipeline component that is desired to be included in theeventual data pipeline. As such, the user may modify the informationmodel and submit as a change to the information model, or may submit asa new information model.

In one example, for each new information model that is submitted, and/orfor each information model that is modified, the information modelupdater/generator module 113 may verify that data source(s) 125, datapipeline component(s) 127, and/or target(s) 129 exist that are of thetypes of data source(s), data pipeline component(s), and/or target(s)indicated in the specification of the information model, and which arepermitted to be controlled via the data pipeline controller 110. Inother words, the information model/updater generator module 113 mayfirst verify that the data pipeline infrastructure 120 is able tofulfill requests that may invoke the information model. In one example,the information model updater/generator module 113 may communicate withthe data pipeline component discovery module 118 to complete this task.For instance, data pipeline component discovery module 118 may maintainan inventory of all of the available data pipeline infrastructure 120(e.g., data source(s) 125, data pipeline components 127, target(s) 129,etc.).

In one example, each time a component is added to the data pipelineinfrastructure 120, a notification may be provided to the data pipelinecomponent discovery module 118. For instance, each of the data pipelinecomponents 127 may be configured to self-report an instantiation and/ora deployment. Alternatively, or in addition, a software defined network(SDN) controller that is responsible for deploying one of the datapipeline components 127 may transmit a notification to the data pipelinecomponent discovery module 118. Similarly, a user who is responsible fordeploying one of the data pipeline components 127 may be responsible fora notification to the data pipeline component discovery module 118(e.g., via one of client devices 188).

It should be noted that new information models may be submitted inconnection with a request fulfillment process, or may be submittedwithout connection to a particular request. For instance, a user maydevelop an information model for a new anticipated use case, withouthaving a specific request for which a data pipeline is to be immediatelybuilt. In one example, a user, e.g., via one of client devices 188 maybrowse the catalog of the information model repository 114 and mayutilize any existing information models as a template for a newinformation model. As illustrated in FIG. 1 , the interactions of datapipeline controller 110 and one of client devices 188 for generatingand/or submitting a new information model may be via information modelupdater/generator module 113. However, in another example, theinformation model repository 114 may alternatively or additionallycomprise an application programming interface (API) which may allow moredirect access the catalog of information models from the one of clientdevice 188. In one example, user objects, information model objects, anddata pipeline component type objects are all first class citizens in thearchitecture so any user could act on (view) any informationmodel/template or data pipeline component type. In addition, there maybe no unnecessary hierarchical control imposed over the inventories thatwould reduce data sharing and limit automation. In accordance with thepresent disclosure, each object may have an intrinsic identity, may bedynamically constructed at runtime, and may be passed as a parameter.

Once an information model is selected and finalized (e.g., approved foruse and/or not objected to), the request interpreter and fulfillmentmodule 111 may also verify that the client device 188 and/or a userthereof is authorized to create a data pipeline with regard to the databeing requested, that the desired target(s) 129 are permitted to receivethe requested data, that the client device 188 and/or a user thereof ispermitted to utilize particular data pipeline components types that areindicated in the specification, and so forth. For instance, the requestinterpreter and fulfillment module 111 may submit the specification toauthorization module 112 along with an identification of the one ofclient devices 188 and/or an identification of a user thereof.Authorization module 112 may maintain records of the permissions forvarious ones of client devices 188 and/or various users or user groups,the permissions of various data pipeline component types, thepermissions for specific ones of the data pipeline components 127, datasource(s) 125, and/or target(s) 129, and so forth. In one example,authorization module 112 may additionally include information regardinguser preferences, limitations, exception handling procedures, etc. Ifthe records associated with the user, the one of client devices 188, thedata pipeline component type(s), etc. are indicative that a datapipeline may be built or adapted to fulfill the request 197 inaccordance with the selected information model, then the authorizationmodule 112 may return an positive confirmation, or authorization, to therequest interpreter and fulfillment module 111. In addition, uponreceipt of a positive confirmation/authorization the request interpreterand fulfillment module 111 may submit the selected information model(e.g., along with parameters of the request 197), to the data pipelinemanagement and assembly (DPMA) module 117.

In one example, the DPMA module 117 is responsible for generating a datapipeline or reconfiguring a data pipeline to fulfill the request 197 inaccordance with the information model that is selected (such asinformation model 195). For instance, the DPMA module 117 may decomposethe specification of the information model 195 into mini-specificationsfor driving data retrieval and data joins, e.g., one mini-specificationper data source. For instance, in the present example, information model195 may comprise or provide a specification which may result in theestablishment and/or reconfiguration of the data pipeline 121, which mayinclude A1, B1, C, and D from data pipeline components 127. In oneexample, a higher-level specification may be delivered to intermediatepoints to merge data streams. To illustrate, the DPMA module 117 maydetermine that the information model provides a roadmap for establishinga data pipeline for delivering base station performance data from one ormore data sources to one or more targets. The request parameters mayprovide information regarding the geographic scope of the request. Inone example, the DPMA 117 may select particular data sources of datasources 125 having the requisite base station performance data inaccordance with the geographic scope information. In one example, thedetermination may be made using information stored in data pipelinecomponent discovery module 118.

In one example, the information model may indicate that an aggregatorcomponent is called for as a first intermediate node in data pipeline121. DPMA 117 may determine that there are multiple aggregatorcomponents available in the data pipeline infrastructure (e.g., A1 andA2). However, DMPA 117 may select one of these in accordance with therequest parameters, e.g., using the geographic scope information, usinginformation regarding the distance or latency from the data source(s)125 (e.g., after selecting the appropriate data source(s) 125), and soforth. For instance, in the present example, DPMA 117 may select anaggregator component A1 from the available data pipeline components 127.It should be noted that DPMA 117 may select additional data pipelinecomponents B1, C, and D from the available data pipeline components 127following a similar analysis.

In one example, DPMA 117 may instantiate the data pipeline 121 inresponse to the request 197 (or in response to an instruction from therequest interpreter and fulfillment module 111 containing the selectedinformation model and parameters of the request 127). In one example,DPMA 117 may configure the data pipeline components A1, B1, C, and D inaccordance with hooks in the information model and/or specificationwhich invoke data schemas associated with the respective data pipelinecomponents types of the data pipeline components A1, B1, C, and D. Forinstance, a data schema for data pipeline component A1 may indicate theavailable commands which may be used to configure data pipelinecomponent A1, the values of different arguments or parameters which maybe used in one or more commands, and so forth. In one example, the hooksin the information model (e.g., information model 195) may be executedby DPMA 117 to retrieve or to invoke the respective data schemas.However, specific configuration commands may be tailored to theparticular data pipeline components 127 that are selected (e.g., todirect configuration commands to A1 (and not to A2), to B1 (and not toB2), to C, and to D). Accordingly, using the various data schemas, DPMA117 may configure the data pipeline components A1, B1, C, and D tofunction as data pipeline 121 and to move the requested data from theone or more of data sources 125 to one or more of targets 129.

To illustrate, data pipeline component A1 may be configured to obtainbase station operational data from at least two of the data sources 125and to aggregate the data at the node. For instance, data pipelinecomponent A1 may utilize Apache Kafka, Data Movement as a Platform(DMaaP), nanomsg, or the like to “subscribe” to the data from therelevant data sources 125. In one example, data pipeline component A1may also be configured to periodically forward the aggregated data todata pipeline component B1. Data pipeline component B1 may be configuredto generate summary data, such as 5 minute moving averages, etc., topare the data, such as removing extra fields, and so forth. Datapipeline component C may be configured to obtain summary data from datapipeline component B1 (e.g., again using Kafka, DMaap, nanomsg, or thelike), to smooth the data and remove any outliers, and to place theprocessed data into a JSON format. Lastly, data pipeline component D maybe configured to periodically obtain the data that is further processedfrom data pipeline component C, to store a copy of the processed data,and to forward the processed data to the desired one or more of targets129.

It should be noted that in one example, parameters of the request 197may indicate a limited temporal scope of the requested data. As such, inone example, DPMA 117 may configure the data pipeline components A1, B1,C, and D to cease the specific functions configured for data pipeline121 after the temporal scope of the request has passed. However, itshould also be noted that as indicated above, the data pipelinecomponent discovery module 118 may maintain information regarding theavailability and current configurations of data pipeline components 127,the data pipeline 121, other data pipelines, etc. As such, in oneexample, all or a portion of the data pipeline 121 (e.g., theconfigurations of any one or more of the data pipeline components A1,B1, C, and D) may be maintained after the fulfillment of the request197, such as if a new request is received and processed by data pipelinecontroller 110 and if it is determined that the same data is beingrequested. Thus, for example, the data may be maintained in datapipeline component D for an additional duration so as to fulfill thisadditional request. For instance, there may be one or more predictorsthat suggest that one or more of the data sources 125 may be reusedagain based on historical trends.

Alternatively, or in addition, the new request may be for obtaining datathat partially overlaps with the data requested in request 197. Forinstance, the new request may be for similar base station operationaldata having the same geographic scope, but for a more extended timeperiod, or for a time period that partially overlaps with a time periodspecified in the request 197. In such case, DPMA 117 may maintain thedata pipeline 121 for an additional duration so as to obtain theadditional data associated with the time period of the new request.Additional scenarios may also lead to the full or partial reuse of datapipeline 121 or other data pipelines. For instance, in another exampledata pipeline 121 may be integrated with another data pipeline, may beexpanded with one or more additional data pipeline components to fulfilla new request (such as adding an additional aggregator for obtainingadditional base station operational data from an additional geographicregion), and so forth. DPMA 117 may maintain an underlying source feedprocess that a plurality of data pipeline instances depend on, as longas a subset of the data pipeline instances continue to exist. DPMA 117may be able to reduce the frequency of enrichment, or lower othercharacteristics of one or more of the remaining data pipelines instancesto compensate for new resulting requirements of any or all of theremaining data pipeline instances.

To further illustrate, in one example, data pipeline 121 may be inexistence (e.g., having been created configured, and either in-use orremaining idle/in standby mode) prior to the request 197. In such case,similar to the example above, DPMA module 117 may determine that theinformation model provides a roadmap for establishing a data pipelinefor delivering base station performance data from one or more datasources to one or more targets. The request parameters may provideinformation regarding the geographic scope of the request. Thus, theDPMA 117 may select particular data sources of data sources 125 havingthe requisite base station performance data in accordance with thegeographic scope information. In one example, the determination may bemade using information stored in data pipeline component discoverymodule 118. However, the information stored in data pipeline componentdiscovery module 118 may also indicate that data pipeline 121 isoperational within the data pipeline infrastructure 120 and is availableto fulfill the request 197. In this case, the nodes of data pipeline 121(e.g., data pipeline components A1, B1, C, and D) may be reconfigured tofulfill the request 197. For instance, the data pipeline components A1,B1, C, and D may be configured/reconfigured using commands via therespective data schema to obtain additional data within the temporal andgeographic scope of the request 197, to forward the processed data toone or more of the targets 129 via data pipeline component D, and soforth.

In still another example, the DPMA module 117 may determine inaccordance with the information model selected for request 197 that therequested data may already be stored, e.g., at data pipeline componentD. For instance, data pipeline component D may have come into possessionof the data in accordance with a different request for which the datapipeline 121 was established. In such an example, data pipelinecomponent D may also store extra data that is not relevant to request197. However, in such case, DPMA 117 may establish a new, shortened datapipeline to fulfill request 197. For instance, the data pipeline maycomprise data pipeline component D (and in one example, the one or moretarget(s) 129, which may also be considered part of the data pipeline).In such case, the configuration may involve configuring the target(s)129 as subscribers to a data feed from data pipeline component Dcomprising the portion of the data stored therein that is pertinent tothe request 197.

In addition to the foregoing, data pipeline controller 110 may alsoinclude a data blending module 119. In accordance with the presentdisclosure, data blending module 119 may establish a plan for blendingdata on behalf of a single request or multiple requests from multipleclients (e.g., for potential reuse and data processing efficiency). Forexample, when multiple data delivery requests are received, the datablending module 119 may be invoked to provide an analysis to determineif duplicated datasets are involved. The data blending module 119 mayalso check if all or a portion of a requested dataset or datasets isavailable in one or many of the intermediate nodes, e.g., data pipelinecomponents 127. The data blending module 119 may also check if therelevant data source(s) 125 have been updated (e.g., while theintermediate node(s) may not have been synced/updated). In one example,the data blending module 119 may determine a most efficient pipelineconfiguration to deliver data from the relevant data source(s) 125 tothe relevant target(s) 129 (or at least a more efficient pipelineconfiguration as compared to independently establishing data pipelinesfor each request according to respective information models associatedwith each request), e.g., with a least number of intermediate processingsteps (or at least a reduced number of intermediate processing steps).

As noted above, request interpreter and fulfillment module 111 mayinitially obtain and process each data request. In particular, requestinterpreter and fulfillment module 111 may select an associatedinformation model to use as a specification for establishing a datapipeline to fulfill each request. However, in one example, during theprocessing, whenever an information model indicates an available datablending capability, the request interpreter and fulfillment module 111may invoke the data blending module 119. Alternatively, or in addition,the data blending module 119 may also consult policies of one or more ofthe requesting clients, and/or polices that are applicable to classes ofclients and/or categories of requests to determine whether, and forwhich request(s), data blending is permitted.

In an illustrative example, multiple requests (e.g., from differentclient devices 188) may be received via request interpreter andfulfillment module 111 at or around the same time, e.g., within fiveseconds, within 5 minutes, etc. The request interpreter and fulfillmentmodule 111 may determine that for any two or more of the requests, thatdata blending is permitted. The request interpreter and fulfillmentmodule 111 may then invoke the data blending module 119 for creating acombined plan for data path/data pipeline configuration. Morespecifically, the data blending module 119 may access information modelsassociated with respective requests as its path(s) creation guidance. Inone example, the request interpreter and fulfillment module 111 maymatch requests to information models, e.g., as described above, andprovide the information models to the data blending module 119. Inanother example, the data blending module 119 may perform the same orsimilar operations to independently match information models to therespective requests.

In one example, the data blending module 119 may determine commondenominators among all data requests. For instance, the data blendingmodule 119 may identify that at least a portion of data is a part ofboth a first data set that is specified in at least a first request anda second data set that a specified in at least second request (e.g.,requests 197 and 198, as illustrated in FIG. 1 ). The data blendingmodule 119 may then create a new data delivery plan, or roadmap fordelivering requested data set(s) from source(s) to target(s) inaccordance with the multiple data requests. This integrated roadmap mayinclude (but is not limited to) a specification which details one ormore of the data source(s) 125 or intermediate nodes (e.g., one or moreof data pipeline components 127) from which data is to be initiallyobtained, how the data is to be obtained from the different datasource(s) 125 and or data pipeline components 127, where to create oneor more intermediate nodes (e.g., from data pipeline components 127) tostore interim datasets, how long data is to be stored at theintermediate node(s), which data manipulations to be performed on theinterim datasets, and so forth. In one example, the data blending module119 may confirm or obtain authorizations from authorization module 112,which may delimit the ability to leverage the new data plan as astandard data view for requests that may not conform to the data scope,filtering, transformation, encryption, etc. that may be performed on aspecific data requirement.

In one example, the plan, or roadmap, may comprise a combination of afirst information model associated with the first request (e.g., request197) and a second information model associated with the second request(e.g., request 198), and with at least one modification to thecombination of the first information model and the second informationmodel. It should be noted that in other examples, the plan may comprisea combination of a first information model, a second information model,a third information model, etc. along with one or more modification tothe combination (e.g., depending upon the number of requests that areidentified having overlapping requests for data). The modification(s)may include omission of one or more data pipeline components (e.g., ascompared to a plan that would comprise a straight combination of thefirst information model and the second information model), an additionof at least one data pipeline component (e.g., a data pipeline componentthat is not present in the first information model and the secondinformation model), an alteration to at least one setting for at leastone data pipeline component that is present in the first informationmodel or the second information model, and so on.

In one example, the at least one modification to the combination of thefirst information model and the second information may be selected forthe plan based upon a number of factors, which may include a reduction(or in increase) of one or more metrics according to the plan ascompared to the combination of the first information model and thesecond information model without the modification, such as: adetermination of a reduction in an overall number of data pipelinecomponents (and/or a reduction in a total number of intermediateprocessing steps), a determination of a reduction in a network bandwidthutilization, a reduction in a latency of a delivery of at least one ofthe first data set or the second data set (or an increase in ananticipated speed of delivery of the data set(s)), a determination of areduction in a cost of a delivery of at least one of the first data setor the second data set, and so forth.

In one example, the data blending module 119 may select one or moremodifications that do not violate a client policy (or client policies ofrespective clients). For instance, the client policy, or policies, maybe contained in the first request (e.g., request 197), the secondrequest (e.g., request 198), or both. Alternatively or in addition, thepolicy, or policies, may be maintained by the data pipeline controller110 on behalf of the client(s), e.g., at authorization module 112, or bythe data blending module 119 itself. Each client policy may specify arestriction on a location of at least one data pipeline component, arestriction on a sharing of at least one data pipeline component, arestriction on an access of other clients to at least a portion of thefirst data set or the second data set (for instance, a client may own adata source and may wish for the data to be exclusively processed), andso forth. Alternatively, or in addition, the at least one modificationmay be selected in accordance with an operator policy of an operator ofthe data pipeline infrastructure 120. For example, the operator policymay balance a reduction in an overall number of data pipeline componentswith a reduction in a latency of a delivery of at least one of the firstdata set or the second data set. Examples of the types of data pipelineconfigurations that may result from a combined plan generated via datablending module 119 are illustrated in FIG. 4 and described in greaterdetail below.

In one example, the data blending module 119 may create a plan thatfirst attempts to reduce an overall number of data pipeline componentsby consolidating separate data pipeline components from two datapipelines into a single function (a single data pipeline component thatis shared by the two data pipelines), and then selecting a location (orproviding criteria for selecting a location) for the shared datapipeline component. For instance, the plan may comprise a specificationthat indicates to use a data pipeline component in an existing locationif one of the two pipelines is already established, or to select alocation that minimizes latency or maximizes throughput based uponlocation criteria of both data pipelines (e.g., locations of sourcesand/or destinations, and/or locations of preceding and/or followingnodes according to the respective information models, which may becombined to create the plan).

In one example, the data blending module 119 may select to add storagewhen at least portion of data is identified as being from a partialoverlap in time of a first data set in accordance with the first request(e.g., request 197) and a second data set in accordance with the secondrequest (e.g., request 198). For instance, the data blending module 119may select to add storage to an existing node, to a node of a first datapipeline, or to a node a second data pipeline, or to add a new node thatis not called for in either of the first information model or the secondinformation based upon overall efficiency in data delivery, such as areduced latency, or a reduction in latency balanced with cost ofdeployment of a new node, and so forth. In one example, the cost can bemonetary, or the “cost” may be an additional resource utilization (e.g.,a processor utilization, a memory utilization, and available storage inthe data pipeline environment 120 or in a portion/region of the datapipeline environment 120, a number of available nodes of a particulartype in the data pipeline environment 120 or in a portion/region of thedata pipeline environment 120, etc.), and so forth. In one example, costcan be additional network bandwidth incurred to store data instead ofstream the data directly to one or more of the target(s) 129.

In one example, the plan, or roadmap, may be provided to the requestinterpreter and fulfillment module 111 as an executable informationmodel. The request interpreter and fulfillment module 111 may thenprovide the plan (e.g., as an executable information model) to the DPMAmodule 117 to establish appropriate data pipelines or data paths tofulfill the multiple data requests. In one example, the plan may also bestored in the support modules (e.g., the information model repository114, as an information model for future reuse). The DPMA module 117 maythen establish data pipelines according to the shared plan. Inparticular, the DPMA module 117 may utilize the shared plan in the sameor similar manner as described above.

It should be noted that in one example, the data blending module 119 mayrequest information from data pipeline component discovery module 118regarding the availability of data pipeline components 127, the statusof the data pipeline components 127, such as whether each of the datapipeline components 127 is configured as part of an existing datapipeline, whether there is spare capacity to configure the data pipelinecomponent for use in additional data pipelines, the data that may bestored in the data pipeline components 127, and so forth. In oneexample, the data blending module 119 may utilize this topology and/orconfiguration information to determine the plan for one or more sharedpaths, or one or more alternate paths, for reliability, redundancy,and/or reuse.

For instance, the data blending module 119 may learn that data requestedin one or both of the request 197 or the request 198 may already bestored at an intermediate node (one of data pipeline components 127). Inone example, this particular intermediate node may be included in theplan. For instance, the plan (e.g., a specification) may leave no choiceto the DPMA module 117 with respect to this particular data pipelinecomponent. On the other hand, for additional data pipeline components,the data blending module 119 may create a plan with various criteria,such as directives, preferences, etc., which may be used by DPMA 117 tomake selections from among the data sources 125 and data pipelinecomponents 127 in accordance with the plan. For instance, these criteriamay remain the criteria as specified in one or both of the informationmodels associated with the requests 197 and 198, which may be combinedto create the shared plan. Thus, the data blending module 119 may useinformation models as inputs, which can be adjusted based on theparticulars of each request, as well as current conditions of the datapipeline infrastructure 120, operator policy, and/or one or more clientpolicies, and so forth.

It should be noted that the system 100 has been simplified. Thus, thesystem 100 may be implemented in a different form than that which isillustrated in FIG. 1 , or may be expanded by including additionalendpoint devices, access networks, network elements, applicationservers, etc. without altering the scope of the present disclosure. Inaddition, system 100 may be altered to omit various elements, substituteelements for devices that perform the same or similar functions, combineelements that are illustrated as separate devices, and/or implementnetwork elements as functions that are spread across several devicesthat operate collectively as the respective network elements. Forexample, the system 100 may include other network elements (not shown)such as border elements, routers, switches, policy servers, securitydevices, gateways, a content distribution network (CDN) and the like,additional clouds, and so forth.

It should also be noted that the modules of data pipeline controller110, the interrelationships and connections shown in FIG. 1 , and soforth is illustrative of just one example of how data pipelinecontroller 110 may be organized and configured. For example, datapipeline component discovery module 118 may be split into two modules,with a separate module to keep track of active and inactive datapipelines, while data pipeline component discovery module 118 maycontinue to maintain an inventory of individual data pipeline components127. In still another example, an additional module may be provided tostore previously processed requests as request templates, to storerequest templates and the associations between request templates andinformation models, to provide the request templates to clients, toobtain feedback on the matching of requests and/or request templates toinformation models (and/or the resulting data pipelines), to learn andupdate associations between request templates and information models,and so forth. Thus, these and other modifications are all contemplatedwithin the scope of the present disclosure.

FIG. 2 illustrates a flowchart of an example method 200 for generating adata schema for a type of data pipeline component and storing anontology and the data schema for the type of data pipeline component ina catalog of data pipeline component types, in accordance with thepresent disclosure. In one example, the method 200 is performed by acomponent of the system 100 of FIG. 1 , such as by the data pipelinecontroller 110, and/or any one or more components thereof (e.g., aprocessor, or processors, performing operations stored in and loadedfrom a memory and comprising any one or more of the modules 111-119). Inone example, the steps, functions, or operations of method 200 may beperformed by a computing device or system 300, and/or processor 302 asdescribed in connection with FIG. 3 below. For instance, the computingdevice or system 300 may represent any one or more components of a datapipeline controller that is/are configured to perform the steps,functions and/or operations of the method 200. Similarly, in oneexample, the steps, functions, or operations of method 200 may beperformed by a processing system comprising one or more computingdevices collectively configured to perform various steps, functions,and/or operations of the method 200. For instance, multiple instances ofthe computing device or processing system 300 may collectively functionas a processing system. For illustrative purposes, the method 200 isdescribed in greater detail below in connection with an exampleperformed by a processing system. The method 200 begins in step 205 andproceeds to step 210.

At step 210, the processing system obtains a first ontology of a firsttype of data pipeline component. For instance, an operator of theprocessing system (e.g., a data pipeline controller) may define anontology format such that a vendor providing a new type of data pipelinecomponent may also provide an ontology associated with the new type ofdata pipeline component.

At step 215, the processing system maps the first ontology to a secondontology for a second type of data pipeline component that is stored ina catalog of data pipeline component types. In one example, the mappingcomprises determining a similarity between the second type of datapipeline component and the first type of data pipeline component. Forexample, the similarity may be based upon a congruence between the firstontology of the first type of data pipeline component and the secondontology of the second type of data pipeline component. For instance,the congruence (e.g., a metric or score that quantifies the extent ofthe matching) may be based upon a number of matches between features ofthe first ontology (e.g., at least one of classes, properties, or classrestrictions) and features of the second ontology (e.g., at least one ofclasses, properties, or class restrictions). In one example, thecongruence may have different weights for matches among differentfeatures, e.g., depending upon the level of a feature within ahierarchy, for example.

At step 220, the processing system provides a second data schema for thesecond type of data pipeline component as a template for a first dataschema for the first type of data pipeline component. For instance, step215 may identify a second type of data pipeline component that is mostsimilar to the first type of data pipeline component. For example, bothof these types of data pipeline components comprise “collectors.” Inaddition, in one example, both of these types of data pipelinecomponents may be provided by a same vendor and/or may “match” withrespect to one or more alternative or additional features. Accordingly,it may be observed that the second data schema is likely to provide therelevant configuration information for all or at least a significantportion of the available functions of the first type of data pipelinecomponent.

At optional step 225, the processing system may present the template toan operator, e.g., via an endpoint device of an operator. Thepresentation may include options for the operator to modify, and/or toapprove or deny the adoption of the template as the first data schema.It should be noted that in one example, step 220 may compriseidentifying a plurality of best matching existing types of data pipelinecomponents from the catalog and optional step 225 may comprisepresenting the plurality of associated data schemas to the operator astemplate options.

Alternatively, the processing system may instead present the template(s)to an operator that is implemented by an automated system, e.g., aself-learning processing system or neural network. For instance, theautomated system may comprise one or more artificial intelligence (AI)and/or machine learning (ML) modules which may be configured to analyzethe template, to approve the template, and/or provide a modification tothe template, and so forth. For instance, the automated system may betrained from past user behaviors regarding presented templates andmodifications (or lack thereof) made to such templates. Over time, theautomated system may learn and predict how certain modifications shouldbe made in response to new templates that are presented. For example,several vendors of a similar type of component may have recentlyprovided new versions which include functionality defined in a newlyreleased industry standard for which new data schemas have already beencreated and/or obtained. When a next vendor releases its own new versionof the same component type, the automated system may implement a similarchange to the template so as to provide a new data schema (e.g., thatincorporates changes to address the new functionality that is sharedacross all vendors newly released versions). In addition, feedback maybe received over time regarding the automated decisions to furtherimpact the learning of the automated system (e.g., via a reinforcementlearning process), such that additional user observations may beomitted. It should be noted that such an automated system may beinstantiated in accordance with any number of different machine learningmodels (MLMs) or machine learning algorithm(s). For example, a deepreinforcement learning (DRL) algorithm may be used in accordance withthe present disclosure to train a deep neural network (DNN), such as adouble deep Q network, and so forth.

At optional step 230, the processing system may obtain at least onechange to the template from the operator (e.g., from a human or from anautomated system). For instance, the operator may be aware that the new“first” type of data pipeline component is an upgraded version of theolder “second” type of data pipeline component and has at least one newfunction. In this case, the operator may alter the template so as toinclude the configuration information for the at least one new function.In one example, the operator may also select among a plurality ofpossible templates (e.g., if presented at optional step 225).

At optional step 235, the processing system may change the template inaccordance with the at least one change. In one example, the processingsystem may send test instructions to at least one instance of the firsttype of data pipeline component to verify that the function added by themodification exists.

At optional step 240, the processing system may obtain an approval ofthe operator to deploy the template (e.g., that is modified orunmodified) as the first data schema.

At step 245, the processing system adds the first type of data pipelinecomponent to the catalog of data pipeline component types, where theadding comprises storing the first ontology and the first data schemafor the first type of data pipeline component in the catalog. In oneexample, the first data schema that is stored in the catalog maycomprise the template that is changed at optional step 235. In oneexample, steps 210-245 may include functions as described above inconnection with the data schema generator/updater module 116 and theontology and data schema repository 115 of FIG. 1 .

At optional step 250, the processing system may identify a firstinformation model that may be impacted by the adding of the first typeof data pipeline component to the catalog, where the first informationmodel comprises a flow sequence for a data pipeline (as well as dataattribute relationships, in one example). For instance, optional step250 may comprise determining that the at least one information modelincludes at least one hook that identifies the second type of datapipeline component.

At optional step 255, the processing system may provide at least onesuggestion to an operator comprising at least one of: a suggestion tomodify the first information model to incorporate the first type of datapipeline component or a suggestion to create a new information model(e.g., based upon the first information model and that incorporates thefirst type of data pipeline component). It should be noted thatincorporating the first data pipeline component may comprise replacingthe second data pipeline component or inserting the first data pipelinecomponent (without replacing the second data pipeline component). Itshould also be noted that optional steps 250 and 255 may further applyto additional information models that may be identified as potentiallybeing impacted by the adding of the first type of data pipelinecomponent to the catalog. In one example, optional steps 250 and 255 mayinclude functions as described above in connection with the informationmodel updater/generator module 113 and/or information model repository114 of FIG. 1 .

At optional step 260, the processing system may obtain a request for adelivery of a data set to at least one destination. In one example, therequest may be in accordance with a request template. In one example,the request may comprise a plurality of parameters such as the desireddata set, a specific data source or data sources, one or more target(s),a relevant time period for obtaining the data of the data set (e.g., forstreaming and/or real-time data) and/or a relevant time period for whichstored data is being requested, a specification of geographic bounds ofthe requested data set, one or more network regions for which data isbeing requested, other keywords, and so forth. In one example, therequest may be formulated in accordance with a data definition language(DDL) that may be understood by the processing system.

At optional step 265, the processing system may map the request to thefirst information model from among a plurality of information models.For instance, the first information model may comprise first metadatarelating to at least one of a name, a region, a task type, and so forth.Similarly, the request may comprise second metadata relating to at leastone of: the name (e.g., an identification of a one or more specific datasources and/or classes of data sources, one or more specifictargets/destinations or classes of targets/destinations, an identifierof the requester and/or an organization of the requester, etc.), theregion (e.g., a geographic indicator, an indicator of a portion of anetwork, a market segment, etc.), or the task type (e.g., “marketintelligence,” “network load balancing,” “media event support,” etc.).As such, the mapping may comprise mapping the request to the firstinformation model based upon a congruence between the first metadata andthe second metadata. For instance, the congruence (e.g., a metric orscore that quantifies the extent of the matching) may be based upon anumber of matches between the metadata parameters.

At optional step 270, the processing system may select a plurality ofdata schemas of a plurality of data pipeline component types inaccordance with the first information model. For instance, the firstinformation model may comprise hooks to the plurality of data schemas.In one example, optional steps 260-270 may include functions asdescribed above in connection with the request interpreter andfulfillment module 111 and/or information model repository 114 of FIG. 1.

At optional step 275, the processing system may determine whether anexisting data pipeline is available to handle the request. For example,the existing data pipeline may be determined to be available when theplurality of data pipeline components are arranged in same manner asindicated in the first information model (and hence the plurality ofdata pipeline components are of the correct data pipeline componenttypes). Alternatively, the existing data pipeline may be determined tobe available: (1) when it has the correct components that can bereconfigured to alternatively or additionally handle the current datadelivery request, or (2) when the existing data pipeline does not haveall of the specified components, but it has a sufficient number orpercentage of the requisite components such that the processing systemmay select to modify/update this data pipeline to alternatively oradditionally handle the current data request, rather than instantiateand arrange a new data pipeline.

The processing system may perform optional step 280 when it isdetermined that an existing data pipeline is available to fulfill therequest. Specifically, optional step 280 may comprise transmittinginstructions to a plurality of data pipeline components of the existingdata pipeline in accordance with the plurality of data schemas toconfigure the plurality of data pipeline components for delivering thedata set to the at least one destination. In one example, optional step280 may further include adding one or more additional data pipelinecomponents to the data pipeline by transmitting instructions to the oneor more additional data pipeline components in accordance withrespective data schemas associated with the one or more additional datapipeline components to configure the one or more additional datapipeline components to function as part of the data pipeline fordelivering the data set to the at least one destination.

On the other hand, the processing system may perform optional steps 285and 290 when it is determined that no existing data pipeline isavailable to fulfill the request. Specifically, optional step 285 maycomprise determining an availability of a plurality of data pipelinecomponents. For instance, optional step 285 may comprise identifying theright data pipeline components of the right data pipeline componenttypes, e.g., finding ones that are available, have capacity, aregeographically proximate or provide the best latency or otherperformance considering where the data is located (at the source(s) orintermediate nodes where the requested data may have been previouslycopied), where the destinations is/are located, and so forth. Theavailabilities, capacities, proximity, and so forth may be determinedbased upon information stored by the processing system (such as by datapipeline component discovery module 118 of FIG. 1 , for example).

At optional step 290, the processing system may transmit instructions tothe plurality of data pipeline components in accordance with theplurality of data schemas to configure the plurality of data pipelinecomponents into a data pipeline, where the data pipeline is fordelivering the data set to the at least one destination. In one example,optional steps 275-290 may include functions as described above inconnection with the request interpreter and fulfillment module 111, datapipeline management and assembly module 117, data pipeline componentdiscovery module 118, and/or authorization module 112 of FIG. 1 .

Following step 280 or optional step 290, the method 200 proceeds to step295 where the method ends.

It should be noted that the method 200 may be expanded to includeadditional steps, or may be modified to replace steps with differentsteps, to combine steps, to omit steps, to perform steps in a differentorder, and so forth. For instance, in one example the processing systemmay repeat one or more steps of the method 200, such as steps 210-245for adding additional types of data pipeline components to the catalog,steps 260-280 or steps 260-290 for additional requests for delivery ofdata, and so forth. In one example, the method 200 may be expanded toinclude obtaining an ontology and a data schema for a new type of datapipeline component (such as from a vendor of the new type of datapipeline component) and adding the new type of data pipeline componentto the catalog (e.g., without performing steps 215 and 220, since thedata schema is already provided). In one example, the method 200 may beexpanded to include obtaining a request to search a catalog of datapipeline components and providing access to all or a portion of thecatalog. In another example, the method 200 may be expanded to includeproviding one or more request templates to a client device and receivinga selection of one of the request templates. For instance, in such anexample, the request that is obtained at optional step 260 may be inaccordance with a request template. For example, a client, via a clientdevice, may provide certain details which may be plugged-in to thetemplate, such as specific dates, times, source(s), locations orregions, target(s), etc. In still another example, the order of optionalsteps 235 and 240 may be reversed. Thus, these and other modificationsare all contemplated within the scope of the present disclosure.

In addition, although not expressly specified above, one or more stepsof the method 200 may include a storing, displaying and/or outputtingstep as required for a particular application. In other words, any data,records, fields, and/or intermediate results discussed in the method canbe stored, displayed and/or outputted to another device as required fora particular application. Furthermore, operations, steps, or blocks inFIG. 2 that recite a determining operation or involve a decision do notnecessarily require that both branches of the determining operation bepracticed. In other words, one of the branches of the determiningoperation can be deemed as an optional step. However, the use of theterm “optional step” is intended to only reflect different variations ofa particular illustrative embodiment and is not intended to indicatethat steps not labelled as optional steps to be deemed to be essentialsteps. Furthermore, operations, steps or blocks of the above describedmethod(s) can be combined, separated, and/or performed in a differentorder from that described above, without departing from the exampleembodiments of the present disclosure.

FIG. 3 depicts a high-level block diagram of a computing device orprocessing system specifically programmed to perform the functionsdescribed herein. For example, any one or more components or devicesillustrated in FIG. 1 or described in connection with the examplemethods 200 or 500 may be implemented as the processing system 300. Asdepicted in FIG. 3 , the processing system 300 comprises one or morehardware processor elements 302 (e.g., a microprocessor, a centralprocessing unit (CPU) and the like), a memory 304, (e.g., random accessmemory (RAM), read only memory (ROM), a disk drive, an optical drive, amagnetic drive, and/or a Universal Serial Bus (USB) drive), a module 305for generating a data schema for a type of data pipeline component andstoring an ontology and the data schema for the type of data pipelinecomponent in a catalog of data pipeline component types and/or forconfiguring data pipeline components for delivering a first data set toat least a first destination and for delivering a second data set to atleast the second destination in accordance with a plan comprising acombination of a first information model associated with a first requestand a second information model associated with a second request andincluding at least one modification to the combination, and variousinput/output devices 306, e.g., a camera, a video camera, storagedevices, including but not limited to, a tape drive, a floppy drive, ahard disk drive or a compact disk drive, a receiver, a transmitter, aspeaker, a display, a speech synthesizer, an output port, and a userinput device (such as a keyboard, a keypad, a mouse, and the like).

Although only one processor element is shown, it should be noted thatthe computing device may employ a plurality of processor elements.Furthermore, although only one computing device is shown in the Figure,if the method(s) as discussed herein is/are implemented in a distributedor parallel manner for a particular illustrative example, i.e., thesteps of the method(s) or the entire method(s) are implemented acrossmultiple or parallel computing devices, e.g., a processing system, thenthe computing device of this Figure is intended to represent each ofthose multiple computers. Furthermore, one or more hardware processorscan be utilized in supporting a virtualized or shared computingenvironment. The virtualized computing environment may support one ormore virtual machines representing computers, servers, or othercomputing devices. In such virtualized virtual machines, hardwarecomponents such as hardware processors and computer-readable storagedevices may be virtualized or logically represented. The hardwareprocessor 302 can also be configured or programmed to cause otherdevices to perform one or more operations as discussed herein. In otherwords, the hardware processor 302 may serve the function of a centralcontroller directing other devices to perform the one or more operationsas discussed above (and/or below).

It should be noted that the present disclosure can be implemented insoftware and/or in a combination of software and hardware, e.g., usingapplication specific integrated circuits (ASIC), a programmable logicarray (PLA), including a field-programmable gate array (FPGA), or astate machine deployed on a hardware device, a computing device, or anyother hardware equivalents, e.g., computer readable instructionspertaining to the method(s) discussed herein can be used to configure ahardware processor to perform the steps, functions and/or operations ofthe herein described method(s). In one example, instructions and datafor the present module or process 305 for generating a data schema for atype of data pipeline component and storing an ontology and the dataschema for the type of data pipeline component in a catalog of datapipeline component types and/or for configuring data pipeline componentsfor delivering a first data set to at least a first destination and fordelivering a second data set to at least the second destination inaccordance with a plan comprising a combination of a first informationmodel associated with a first request and a second information modelassociated with a second request and including at least one modificationto the combination (e.g., a software program comprisingcomputer-executable instructions) can be loaded into memory 304 andexecuted by hardware processor element 302 to implement the steps,functions or operations as discussed above in connection with theexample method 200 and/or as discussed below in connection with theexample method 500. Furthermore, when a hardware processor executesinstructions to perform “operations,” this could include the hardwareprocessor performing the operations directly and/or facilitating,directing, or cooperating with another hardware device or component(e.g., a co-processor and the like) to perform the operations.

The processor executing the computer readable or software instructionsrelating to the described method(s) can be perceived as a programmedprocessor or a specialized processor. As such, the present module 305for generating a data schema for a type of data pipeline component andstoring an ontology and the data schema for the type of data pipelinecomponent in a catalog of data pipeline component types and/or forconfiguring data pipeline components for delivering a first data set toat least a first destination and for delivering a second data set to atleast the second destination in accordance with a plan comprising acombination of a first information model associated with a first requestand a second information model associated with a second request andincluding at least one modification to the combination (includingassociated data structures) of the present disclosure can be stored on atangible or physical (broadly non-transitory) computer-readable storagedevice or medium, e.g., volatile memory, non-volatile memory, ROMmemory, RAM memory, magnetic or optical drive, device or diskette andthe like. Furthermore, a “tangible” computer-readable storage device ormedium comprises a physical device, a hardware device, or a device thatis discernible by the touch. More specifically, the computer-readablestorage device may comprise any physical devices that provide theability to store information such as data and/or instructions to beaccessed by a processor or a computing device such as a computer or anapplication server.

To further aid in understanding the present disclosure, FIG. 4illustrates a set 400 of example scenarios of establishing a new planfor configuring data pipeline infrastructure via data blending inaccordance with multiple requests for overlapping data. To illustrate,scenario 1 shows a first example relating to two requests havingassociated information models (information model 1 and information model2, respectively). For instance, information model 1 may comprise aspecification for an example data pipeline configuration for delivery ofdata from data sources A and E to target D. The data pipelineconfiguration according to information model 1 further includesintermediate nodes B and C, as illustrated. Similarly, information model2 may comprise a specification for an example data pipelineconfiguration for delivery of data from data sources E and F to targetI. The data pipeline configuration according to information model 2further includes intermediate nodes G and H, as illustrated. In thepresent example, a data pipeline controller (e.g., via a data blendingmodule (such as the data blending module 119 of FIG. 1 )) may establisha new plan which may comprise a combination of information model 1 andinformation model 2, along with at least one modification.

For instance, in the example scenario 1 of FIG. 4 , the new plan mayinclude a configuration of node J to include the functions of node G,along with a new intermediate storage function (that is not specified inthe information models 1 and 2). For instance, node J may store the datafrom source E for distribution to nodes B and H for respective datapipelines associated with the first and second requests. For example,the first request may be for streaming data from source A and fromsource E, while the time period for obtaining data from source E endsbefore the time period for obtaining data from source A. In addition,the second request may be for streaming data from sources E and F, wherethe time period for obtaining data from source E ends before the timeperiod for obtaining data from source F. However, the data source E maybe bandwidth-limited or may be in a geographic region or network zonethat has a large variance in utilization at different times of day(e.g., daytime and overnight). In such case, the data pipelinecontroller (e.g., via the data blending module) may determine that it ismore efficient (e.g., any one or more of: faster, less latency, lessexpensive bandwidth utilization, etc.) to copy the data from node E toan intermediate node J for temporary storage, and then to distribute tonodes B and H (e.g., at respective times when the last data from sourcesA and/or F are available, when the network utilization between nodes J,B, C, D and/or nodes J, H, and I is least costly, etc.).

In the next example, scenario 2, a first request may be associated withinformation model 1, which may comprise a specification for an exampledata pipeline configuration for delivery of data from data source A totarget D. The first request may specify a first data set (data set 1)comprising data form source A between a time T1 and a time T2.Similarly, a second request may be associated with information model 2,which may comprise a specification for an example data pipelineconfiguration for delivery of data from data source A to target E.However, the second request may specify a second data set (data set 2)comprising data form source A between a time T3 and a time T4. As can beseen in FIG. 4 , these time periods overlap between T3 and T2. Thus, therequests are for at least a portion of the same data. In this case, bothrequests may be received by a data pipeline controller, which maydetermine that data blending is permitted, and which may then determinea new plan for configuring data pipeline components of a data pipelineinfrastructure to fulfill both requests. For instance, in this case, thedata pipeline controller (e.g., via a data blending module) maydetermine a new plan as illustrated in FIG. 4 in which source A, andnodes B and C are shared by both requests (e.g., these components willbe considered part of two data pipelines or a shared data pipeline thatmay be established in accordance with the new plan). From node C, therespectively requested data may be distributed to the targets D and E.For example, since there is a temporal overlap, the new plan maycomprise a specification that calls for storage to be added at node C(if neither of information models 1 and 2 specifies storage at node C)or for additional storage to be added at node C (e.g., if informationmodel 1 specifies storage at node C, but the designated amount ofstorage may be insufficient for a longer duration and/or quantity ofdata storage to additionally fulfill the second request for data set 2).

In the third example, scenario 3, a first request may be associated withinformation model 1, which may comprise a specification for an exampledata pipeline configuration for delivery of a data set (data set 1) fromdata source A to target D. A second request may be associated withinformation model 2, which may comprise a specification for an exampledata pipeline configuration for delivery of a data set (data set 2) fromdata source A to target G. In this example, the data pipeline controller(e.g., via a data blending module) may determine that data sets 1 and 2partially overlap. For instance, both data sets include a same subset ofdata (data subset 2). For illustrative purposes, non-overlapping subsets(data subsets 1 and 3) that are exclusive to data set 1 and data set 2,respectively, are also labeled in FIG. 4 . Continuing with the presentexample, it may be determined that A-E-F may be a most efficient path(e.g., least or lower cost, faster/reduced latency, higher bandwidthavailability, etc.) and that nodes E and F may provide the same orsimilar functions as nodes B and C. Initially, it may be considered thatall of data sets 1 and 2 should be routed via A-E-F, with finaldistribution from node F to targets D and G, respectively. However, asnoted above, a data pipeline controller (e.g., via a data blendingmodule) may ensure that shared plans via data blending do not violate anoperator and/or client policies. In this case, a client associated withthe first request and the first information model may have a policy thatspecifies a geographic restriction on processing of certain sensitivedata, which may include data subset 1, whereas data subset 2 may nothave a similar protection/restriction. In this case, a new plan may beselected which efficiently routes data subsets 2 and 3 via A-E-F tofulfill all of the second request, and to partially fulfill the firstrequest. The remaining part of the requested data set 1 for the firstrequest (i.e., data subset 1) may be routed via A-B-C, and from C totarget D for final delivery. For instance, nodes B and C may beestablished in allowable geographic locations, in allowable vendorinfrastructure (e.g., in specified cloud provider infrastructure(s)),and so forth, in accordance with the policy of the first clientassociated with the first request. Notably, the overall cost, latency,bandwidth utilization, or other factors may be improved by this splitrouting as compared to a plan comprising information model 1 andinformation model 2 without coordination and without such modificationas indicated in the example scenario 3.

FIG. 5 illustrates a flowchart of an example method 500 for configuringdata pipeline components for delivering a first data set to at least afirst destination and for delivering a second data set to at least thesecond destination in accordance with a plan comprising a combination ofa first information model associated with a first request and a secondinformation model associated with a second request and including atleast one modification to the combination, in accordance with thepresent disclosure. In one example, the method 500 is performed by acomponent of the system 100 of FIG. 1 , such as by the data pipelinecontroller 110, and/or any one or more components thereof, such a datablending module 119 (e.g., a processor, or processors, performingoperations stored in and loaded from a memory). In one example, thesteps, functions, or operations of method 500 may be performed by acomputing device or system 300, and/or processor 302 as described inconnection with FIG. 3 . For instance, the computing device or system300 may represent any one or more components of a data pipelinecontroller that is/are configured to perform the steps, functions and/oroperations of the method 500. Similarly, in one example, the steps,functions, or operations of method 500 may be performed by a processingsystem comprising one or more computing devices collectively configuredto perform various steps, functions, and/or operations of the method500. For instance, multiple instances of the computing device orprocessing system 300 may collectively function as a processing system.For illustrative purposes, the method 500 is described in greater detailbelow in connection with an example performed by a processing system.The method 500 begins in step 505 and proceeds to step 510.

At step 510, the processing system obtains a first request for adelivery of a first data set to at least a first destination. Forinstance, the processing system may comprise a data pipeline controllerand may obtain the first request via a request interpreter andfulfillment module (such as request interpreter and fulfillment module111 of data pipeline controller 110 of FIG. 1 ). Step 510 may comprisethe same or similar operations as described above in connection withoptional step 260 of the example method 200.

At step 520, the processing system maps the first request to a firstinformation model of a plurality of information models. For instance,the first information model may comprise first metadata relating to atleast one of a name, a region, a task type, a technology, and so forth.Similarly, the first request may comprise second metadata relating to atleast one of: the name, the region, the task type, the technology, etc.As such, the mapping may comprise mapping the request to the firstinformation model based upon a congruence between the first metadata andthe second metadata. In one example, step 520 may comprise the same orsimilar operations as described above in connection with optional step265 of the example method 200.

At step 530, the processing system obtains a second request for adelivery of a second data set to at least a second destination. Forinstance, step 530 may comprise the same or similar operations asdescribed above in connection with step 510 (and/or in connection withoptional step 260 of the example method 200), but with respect to thesecond request, where the second request has different parameters fromthe first request.

At step 540, the processing system maps the second request to a secondinformation model of the plurality of information models. For instance,step 540 may comprise the same or similar operations as described abovein connection with step 520 (and/or in connection with optional step 265of the example method 200), but with respect to the second request,where the second request has different parameters from the firstrequest.

At step 550, the processing system identifies that at least a portion ofdata is a part of both the first data set and the second data set. Forexample, the respective parameters of the first request and the secondrequest may specify the data sets being requested, such as by includingone or more of: an identifier or identifiers of the data source(s), thedata set name/label, fields within the respective data sets, date/timerange(s) being requested, and so forth. In one example, theidentification of the at least the portion of the data being a part ofboth the first data set and the second data set may additionally bediscerned by the processing system from one or both of the respectiveinformation models. For instance, the first request may indicate desireddata of the first data set by indicating a database name. On the otherhand, the second request may indicate desired data of the second dataset by indicating a type of data being requested. However, the secondinformation model may include a specification from which the databasecontaining the type of data being requested may be identified (and whichmay be the same database as that which is identified by name in thefirst request). Thus, the processing system may determine that at leasta portion of the data in the database is being requested as part of bothfirst data set and the second data set.

At step 560, the processing system determines a plan for configuringdata pipeline components for delivering the first data set to the atleast the first destination and for delivering the second data set tothe at least the second destination, where the plan comprises acombination of the first information model and the second informationmodel, and where the plan comprises at least one modification to thecombination of the first information model and the second informationmodel. In one example, step 560 is performed in response to adetermination that a data blending is permitted.

In one example, the at least one modification may include an omission ofat least one data pipeline component that is present in at least one ofthe first information model or the second information model. Forexample, the processing system may determine that at least the portionof the data that is being requested according to both the first requestand the second request may be stored at an intermediate node (e.g., adata pipeline component) such that the at least the portion of the datadoes not need to be again obtained from the source. Thus, the sourceand/or one or more other intermediate nodes/data pipeline componentsbetween the source and the intermediate node where the at least theportion of the data is stored may be omitted. In one example, the atleast one modification may include an addition of at least one datapipeline component that is not present in the first information modeland the second information model. For instance, an intermediate storagenode may be added to store the at least the portion of the data, wherethe intermediate storage node would not be utilized according to thefirst information model and according to the second information model.This could be the case where the first data set and the second data setare partially overlapping in time (and where the overlap comprises theat least the portion of the data).

In one example, the at least one modification may include an alterationto at least one setting for at least one data pipeline component that ispresent in at least one of the first information model or the secondinformation model. For instance, in one example, the alteration to theat least one setting may comprise changing a storage duration of atleast the portion of the data at the at least one data pipelinecomponent. In one example, the changing of the storage duration maycomprise changing from a “no storage” setting to a “storage” setting(e.g., with a duration of the storage being specified). Alternatively,or in addition, the alteration to the at least one setting may includechanging a location criteria for the least one data pipeline component.For instance, an intermediate node of a first data pipeline inaccordance with the first information model may be specified to belocated as close as possible to one or more preceding nodes, or to oneor more following nodes, specified to be located intermediate betweenone or more preceding nodes and one or more following nodes, etc.However, the at least the portion of the data may be obtained from anode other than the source (or one or more sources), which may alter theideal or preferred location(s) for the at least one data pipelinecomponent (e.g., as would otherwise be specified according to the firstinformation model or the second information model).

In one example, the at least one modification to the combination of thefirst information model and the second information is selected for theplan based upon a determination of a reduction in an overall number ofdata pipeline components (and/or intermediate processing steps)according to the plan as compared to the combination of the firstinformation model and the second information model without themodification. In one example, the at least one modification is selectedfor the plan based upon a determination of a reduction in a networkbandwidth utilization according to the plan as compared to thecombination of the first information model and the second informationmodel without the modification. In one example, the at least onemodification is selected for the plan based upon a determination of areduction in a latency of a delivery of at least one of the first dataset or the second data set for the plan as compared to the combinationof the first information model and the second information model withoutthe modification. In one example, the at least one modification isselected for the plan based upon a determination of a reduction in acost of a delivery of at least one of the first data set or the seconddata set for the plan as compared to the combination of the firstinformation model and the second information model without themodification. It should be noted that other factors above can alsoresult in reduced costs.

In one example, the processing system may create the plan by initiallyattempting to reduce number of data pipeline components by consolidatingseparate data pipeline components from two pipelines into a singlefunction (a single data pipeline component that is shared by the twopipelines). The processing system may then select a location (or providelocation selection criteria) for the shared data pipeline component. Forinstance, the location selection criteria can specify using an existinglocation, if one of the two data pipelines is already established andcontains an instance of the data pipeline component, and otherwiseplacing the shared data pipeline component in a location that minimizeslatency or maximizes throughput based upon location criteria of bothdata pipelines (e.g., locations of source(s) and/or target(s), and/orlocations of preceding and/or following nodes according to the firstinformation model and second information model (upon which the plan isinitially based)).

In one example, the processing system may select to add storage when atleast portion of data is identified as being from partial overlap intime of first data set and second data set. In addition, the processingsystem may further select between: (1) adding storage to an existingnode, (2) adding storage to a node specified in the first data pipelineaccording to the first information model or to a node specified in thesecond data pipeline according to the second information model, or (3)adding a new node that is not specified in either of the firstinformation model or the second information model. The selection may bebased upon a calculated overall efficiency in data delivery (reducedlatency) or based upon a calculated reduction in latency balanced with acost of deployment of the new node. It should be noted that the cost canbe monetary, or the cost can be additional resource utilization (e.g.,processor, memory, available storage in the data pipeline environment orin a portion/region of the data pipeline environment, a number ofavailable nodes of a particular type in the data pipeline environment orin a portion/region of data pipeline environment, additional networkbandwidth incurred to store at the new node instead of stream directlyto target(s), etc.).

In one example, the at least one modification is selected in accordancewith an operator policy of an operator of the data pipeline environment.In one example, the operator policy balances a reduction in an overallnumber of data pipeline components with a reduction in a latency of adelivery of at least one of the first data set or the second data set.In one example, the at least one modification may be selected to balancereduced bandwidth with reduced latency. In one example, the selection ofthe at least one modification may involve balancing multiple factors. Inaddition, in one example, the balancing may include a weighting to favorand/or disfavor certain factors in contributing to a solution.

At optional step 570, the processing system may verify that the at leastone modification does not violate a client policy of a client associatedwith the first request or the second request. In one example, the clientpolicy is contained in one of the first request or the second request.Alternatively, or in addition, the client policy may be maintained bythe processing system on behalf of the client. The client policy mayspecify a restriction on at least one of: a location of at least onedata pipeline component, a sharing of the at least one data pipelinecomponent (e.g., among different data pipelines), or an access of otherclients to at least a portion of the first data set or the second dataset. It should be noted that in one example, optional step 570 mayinclude verifying that the modification does not violate policies ofboth (or several) clients (e.g., if different clients are associatedwith the respective requests).

At step 580, the processing system configures the data pipelinecomponents for delivering the first data set to the at least the firstdestination and for delivering the second data set to the at least thesecond destination in accordance with the plan. In one example, step 580may be performed via a data pipeline management and assembly (DPMA)module (such as DPMA module 117 of FIG. 1 ). For example, step 580 mayinclude various operations as described in connection with any one ormore of optional steps 270-290 of the example method 200. For instance,as noted above, the plan may be processed similar to an informationmodel, and may comprise a specification for selecting and configuringdata pipeline components. In this case, the plan may be for configuringtwo data pipelines (which may alternatively or additionally beconsidered to be a single, shared data pipeline). However, the plan maystill include hooks to the plurality of data schemas for a plurality ofdata pipeline components, specific configuration parameters, and soforth, similar to the information models upon which the plan is based.

It should be noted that in some cases, the plan may have morespecificity (and hence less flexibility) in selecting and configuringdata pipeline components as compared to the information models. Forinstance, the plan may be generated by the processing system via a datablending module, provided to request interpreter and fulfillment module,and forwarded to a data pipeline management and assembly (DPMA) modulefor execution. For example, as described above in connection with theexample of FIG. 1 , in order to select a specific instance of a datapipeline component to include in a data pipeline, DPMA module 117 mayevaluate criteria contained in the information model(s)/specificationsin view of the current topology of the data pipeline infrastructure 120,the availability of data pipeline components 127, the operator policy,the client polices, etc. However, as also noted above, in one example,the data blending module 119 may refer to the data pipeline componentdiscovery module 118, in which case, the data blending module 119 mayinclude specific data pipeline components in the plan. This may be incontrast to an information model, which may more generally specify atype of data pipeline component, location criteria, andsettings/configurations to apply, where the DPMA 117 may select theactual instance of the data pipeline component based upon these factors.

Following step 580, the method 500 proceeds to step 595 where the methodends.

It should be noted that the method 500 may be expanded to includeadditional steps, or may be modified to replace steps with differentsteps, to combine steps, to omit steps, to perform steps in a differentorder, and so forth. For instance, in one example the processing systemmay repeat one or more steps of the method 500, such as steps 510-580for additional requests for delivery of data, and so forth. In oneexample, the method 500 may be expanded to include processing additionalrequests for at least the portion of the data, e.g., a third request, afourth request, etc., any or all of which may be requesting the sameportion of data. In one example, the method 500 may include additionaloperations in connection with determining that a request is notrequesting the same portion of data. For instance, the method 500 mayinclude operations to fulfill the request via an associated informationmodel, without coordination and modification via a shared plan for otherrequests. In another example, the method 500 may be expanded to includedetermining that a data blending is permitted. For instance, prior tostep 550 and/or step 560, the processing system may determine that a“blending by default” applies to both requests. In one example, theprocessing system may determine that data blending is permitted for bothrequests according to respective client policies that may be stored bythe processing system, and/or according to a policy of an operator ofthe data pipeline environment. Alternatively, or in addition, theprocessing system may determine that data blending is authorized viaparameters contained in either or both of the first request and thesecond request. For instance, the first client may have a policy that is“no data blending by default.” However, the client policy may also allowthat this default may be overridden by a specific authorizationcontained in the first request. In still another example, operations ofthe method 500 may be expanded to include (and/or combined with) voperations described above in connection with the example method 200 ofFIG. 2 . Thus, these and other modifications are all contemplated withinthe scope of the present disclosure.

In addition, although not expressly specified above, one or more stepsof the method 500 may include a storing, displaying and/or outputtingstep as required for a particular application. In other words, any data,records, fields, and/or intermediate results discussed in the method canbe stored, displayed and/or outputted to another device as required fora particular application. Furthermore, operations, steps, or blocks inFIG. 5 that recite a determining operation or involve a decision do notnecessarily require that both branches of the determining operation bepracticed. In other words, one of the branches of the determiningoperation can be deemed as an optional step. However, the use of theterm “optional step” is intended to only reflect different variations ofa particular illustrative embodiment and is not intended to indicatethat steps not labelled as optional steps to be deemed to be essentialsteps. Furthermore, operations, steps or blocks of the above describedmethod(s) can be combined, separated, and/or performed in a differentorder from that described above, without departing from the exampleembodiments of the present disclosure.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. Thus, the breadth and scope of a preferred embodiment shouldnot be limited by any of the above-described example embodiments, butshould be defined only in accordance with the following claims and theirequivalents.

What is claimed is:
 1. A method comprising: obtaining, by a processingsystem including at least one processor, a first request for a deliveryof a first data set to at least a first destination; mapping, by theprocessing system, the first request to a first information model of aplurality of information models; obtaining, by the processing system, asecond request for a delivery of a second data set to at least a seconddestination; mapping, by the processing system, the second request to asecond information model of the plurality of information models;identifying, by the processing system, that at least a portion of datais a part of both the first data set and the second data set;determining, by the processing system, a plan for configuring datapipeline components for delivering the first data set to the at leastthe first destination and for delivering the second data set to the atleast the second destination, wherein the plan comprises a combinationof the first information model and the second information model, andwherein the plan comprises at least one modification to the combinationof the first information model and the second information model; andconfiguring, by the processing system, the data pipeline components fordelivering the first data set to the at least the first destination andfor delivering the second data set to the at least the seconddestination in accordance with the plan.
 2. The method of claim 1,wherein the at least one modification comprises an omission of at leastone data pipeline component that is present in at least one of the firstinformation model or the second information model.
 3. The method ofclaim 1, wherein the at least one modification comprises an addition ofat least one data pipeline component that is not present in the firstinformation model and the second information model.
 4. The method ofclaim 3, wherein the at least one data pipeline component comprises astorage node.
 5. The method of claim 1, wherein the at least onemodification comprises an alteration to at least one setting for atleast one data pipeline component that is present in at least one of thefirst information model or the second information model.
 6. The methodof claim 5, wherein the alteration to the at least one setting compriseschanging a storage duration of at least the portion of the data at theat least one data pipeline component.
 7. The method of claim 5, whereinthe alteration to the at least one setting comprises changing a locationcriteria for the at least one data pipeline component.
 8. The method ofclaim 1, wherein the at least one modification to the combination of thefirst information model and the second information is selected for theplan based upon a determination of a reduction in an overall number ofdata pipeline components according to the plan as compared to thecombination of the first information model and the second informationmodel without the modification.
 9. The method of claim 1, wherein the atleast one modification to the combination of the first information modeland the second information is selected for the plan based upon adetermination of a reduction in a network bandwidth utilizationaccording to the plan as compared to the combination of the firstinformation model and the second information model without themodification.
 10. The method of claim 1, wherein the at least onemodification to the combination of the first information model and thesecond information is selected for the plan based upon a determinationof a reduction in a latency of a delivery of at least one of the firstdata set or the second data set for the plan as compared to thecombination of the first information model and the second informationmodel without the modification.
 11. The method of claim 1, wherein theat least one modification to the combination of the first informationmodel and the second information is selected for the plan based upon adetermination of a reduction in a cost of a delivery of at least one ofthe first data set or the second data set for the plan as compared tothe combination of the first information model and the secondinformation model without the modification.
 12. The method of claim 1,wherein the at least one modification is selected in accordance with anoperator policy of an operator of the data pipeline environment.
 13. Themethod of claim 12, wherein the operator policy balances a reduction inan overall number of data pipeline components with a reduction in alatency of a delivery of at least one of the first data set or thesecond data set.
 14. The method of claim 1, further comprising:verifying that the at least one modification does not violate a clientpolicy of a client associated with the first request or the secondrequest.
 15. The method of claim 14, wherein the client policy iscontained in one of the first request or the second request.
 16. Themethod of claim 14, wherein the client policy is maintained by theprocessing system on behalf of the client.
 17. The method of claim 14,wherein the client policy specifies a restriction on at least one of: alocation of at least one data pipeline component; a sharing of the atleast one data pipeline component; or an access of other clients to atleast a portion of the first data set or the second data set.
 18. Themethod of claim 1, wherein the determining the plan is performed inresponse to a determination that a data blending is permitted.
 19. Anon-transitory computer-readable medium storing instructions which, whenexecuted by a processing system including at least one processor, causethe processing system to perform operations, the operations comprising:obtaining a first request for a delivery of a first data set to at leasta first destination; mapping the first request to a first informationmodel of a plurality of information models; obtaining a second requestfor a delivery of a second data set to at least a second destination;mapping the second request to a second information model of theplurality of information models; identifying that at least a portion ofdata is a part of both the first data set and the second data set;determining a plan for configuring data pipeline components fordelivering the first data set to the at least the first destination andfor delivering the second data set to the at least the seconddestination, wherein the plan comprises a combination of the firstinformation model and the second information model, and wherein the plancomprises at least one modification to the combination of the firstinformation model and the second information model; and configuring thedata pipeline components for delivering the first data set to the atleast the first destination and for delivering the second data set tothe at least the second destination in accordance with the plan.
 20. Anapparatus comprising: a processing system including at least oneprocessor; and a computer-readable medium storing instructions which,when executed by the processing system, cause the processing system toperform operations, the operations comprising: obtaining a first requestfor a delivery of a first data set to at least a first destination;mapping the first request to a first information model of a plurality ofinformation models; obtaining a second request for a delivery of asecond data set to at least a second destination; mapping the secondrequest to a second information model of the plurality of informationmodels; identifying that at least a portion of data is a part of boththe first data set and the second data set; determining a plan forconfiguring data pipeline components for delivering the first data setto the at least the first destination and for delivering the second dataset to the at least the second destination, wherein the plan comprises acombination of the first information model and the second informationmodel, and wherein the plan comprises at least one modification to thecombination of the first information model and the second informationmodel; and configuring the data pipeline components for delivering thefirst data set to the at least the first destination and for deliveringthe second data set to the at least the second destination in accordancewith the plan.